Specializing in Infrastructure, Security & Automation
Principal engineer with expertise in building secure, scalable bare-metal and cloud platforms. Proven track record of architecting complex provisioning workflows, implementing zero-trust security, and leading technical teams to deliver resilient multi-tenant systems.
Key infrastructure and automation projects demonstrating technical leadership and innovation
Sole developer of a high-performance async Python API spanning 45,000+ lines of production code. I architected and implemented the entire system from scratch, handling everything from the initial design to production deployment. Built the core orchestration engine that manages mTLS certificate lifecycle, coordinates iPXE network boot operations, and orchestrates distributed hardware provisioning across multiple datacenters. Integrated built-in TFTP and Samba/CIFS servers for legacy boot support and file sharing. Implemented automatic NetBird VPN connectivity with health monitoring running as background services. Designed sophisticated initrd overlay system that dynamically builds custom boot environments (discovery, live provisioning, system rescue, Ubuntu rescue) with embedded mTLS certificates, SSH keys, network configuration, and phone-home endpoints pre-configured. Personally wrote critical subsystems including the structured JSON logging framework (750 lines), SSH execution engine with connection pooling (643 lines), the dynamic Vault PKI certificate service (1,218 lines), and the initrd builder with secure credential injection. Deployed and maintained in production for 3+ years, serving as the backbone of the entire bare-metal infrastructure.
Architected and built a comprehensive 27,000+ line monorepo managing the complete lifecycle of rental bare-metal infrastructure and datacenter bridge nodes. I designed the entire system as a Nomad-orchestrated platform with multiple coordinated subsystems: lifecycle state machines (7,733 lines of Python), hardware collectors (8,743 lines), Terraform zone provisioning (1,145 lines), and Ansible host configuration (1,916 lines). Built 7 different Nomad job types working in concert - system jobs for continuous lifecycle management, cron jobs for synchronization, batch jobs for bootstrapping, and dispatch jobs for on-demand operations. Created the "managed bridge" architecture where Nomad orchestrates specialized datacenter nodes providing DHCP, VPN, and monitoring services. The lifecycle logic uses Nomad node metadata as a state machine, triggering different job groups based on device status. This became the central orchestration platform managing hundreds of devices across multiple datacenters.
Led the complete lifecycle of a sophisticated ~14K-line Azure Functions platform from initial greenfield development through production scaling to strategic architectural migration. I architected and built the original system with 8 coordinated functions using Durable Functions orchestration, Kafka event streaming, and comprehensive Redfish/IPMI hardware management. The project integrated deeply with 9 external systems (NetBox, Vault, Kafka, Redfish, IPMI, Zabbix, Nomad, Netbird, Aiven) handling device lifecycle, self-help onboarding, and health monitoring. After 2+ years of production operation and 800+ commits, I led a major architectural evolution - migrating from monolithic SharedClasses (~2K lines) to modern service-oriented architecture (9 specialized services totaling ~13K lines). Successfully executed a massive deprecation removing ~18K lines of legacy code while maintaining zero downtime. Currently guiding the frontend development team on porting remaining functions to new platforms, providing architectural documentation and migration patterns.
Designed and implemented a comprehensive zero-trust security architecture spanning the entire backend infrastructure. I architected multi-tenant isolation using Vault namespaces with zone-specific identity boundaries, securing every communication channel with mTLS. Built the authentication flow from end users through Azure/Kafka to backend services, implementing JWT-based claims with AppRole and entity-based authorization. Created a zone-restricted access model where each tenant/site/location has isolated PKI, secrets, and network policies. Integrated Vault OIDC for service authentication, GitLab CI/CD for automated token generation, and Nomad for workload identity. Every component - Nomad servers, marketplace hosts, bridges, and external services - communicates over mTLS with Vault-issued certificates. This architecture enabled secure multi-tenant operations while maintaining strict isolation and auditability.
Led a complete transformation of the company's bare-metal deployment strategy, migrating from slow, error-prone Ubuntu ISO installations to a sophisticated direct provisioning system using our custom live OS. I architected and implemented the entire direct provisioning workflow that bypassed traditional ISO-based installations, instead using the custom live environment to directly set up disks, partition layouts, and install operating systems programmatically. This was one of the earliest major projects I tackled and became a foundational capability that enabled everything that followed. The new approach gave us complete control over the provisioning process, allowed rapid expansion of supported OS variants, and dramatically improved deployment reliability. This architectural shift was a major company milestone that transformed how infrastructure was deployed.
Architected and implemented a lightweight, containerized bridge solution that enabled remote datacenter provisioning without requiring full infrastructure deployment. I designed a public-facing mTLS-secured API that IPMI controllers could connect to for iPXE booting, provisioning, and secure disk wiping. This eliminated the need to deploy the full bridge infrastructure (with dual network management) in every datacenter. Instead, a lightweight container could communicate back to the central bridge over mTLS, dramatically reducing deployment complexity while maintaining security. The architecture supported the full device lifecycle from inventory, through provisioning, to decommissioning - all remotely managed through secure API calls. This was a key innovation that made multi-datacenter expansion feasible.
Owned the deployment, configuration, and day-to-day operation of 14 critical infrastructure services across dev/staging/production environments over 3+ years before transitioning to a dedicated DevOps team. I architected enterprise-grade production deployments with high availability, automated monitoring, backup strategies, SSL certificate renewal, centralized syslog forwarding, and Vault SSH certificate signing for admin access. Led the deployment strategy through multiple evolutionary phases: started with a pure Ansible monorepo, evolved to Ansible dynamically generating Terraform configurations, and ultimately migrated to static Terraform modules with Ansible roles for service configuration. Wrote comprehensive Ansible roles for each service (Vault, Nomad, NetBox, Zabbix, Redis, NetBird, Kafka, etc.) handling installation, configuration, and ongoing management. Created all Terraform modules from scratch (4,595 lines) with reusable git-sourced modules for Azure deployments. Personally handled production incidents, capacity planning, version upgrades, and security patches. This hands-on operational experience gave me deep expertise in running production infrastructure at scale.
Architected and built a comprehensive automated OS image factory using Packer, Ansible, and Python for building customized Linux distributions across multiple architectures. I designed the entire system from the build definition format to the CI/CD pipeline, creating a Python-based configuration generator that produces hundreds of Packer build variants from a single YAML definition. The system supports multiple Linux distributions (Ubuntu, Debian), versions (12, 22, 24), architectures (AMD64, ARM64), and custom variants (vanilla, HPC, Proxmox). Built parallelized GitLab CI/CD pipelines with dedicated bare-metal runners that reduced build times by 70%. Implemented intelligent artifact storage using Restic with deduplication and encryption - reducing hundreds of GB of OS image builds to less than 100GB through block-level deduplication. Stored deduplicated images in Cloudflare R2 for global edge distribution. This became the foundation for all OS provisioning across the infrastructure.
Built a custom live Linux operating system from scratch using Ansible that provides the complete runtime environment for bare-metal discovery and provisioning operations. I designed the entire build pipeline including multi-architecture support (AMD64, ARM64), automated CI/CD with GitLab runners, and distribution via R2 storage with Restic deduplication. The live OS contains all necessary packages pre-installed - NVIDIA drivers, diagnostic tools, network utilities, Nomad client, NetBird VPN, and hardware collection dependencies. Created a fully automated build system using Ansible chroot operations that generates bootable ISOs with all tools, network configuration, and security credentials baked in. Implemented Restic deduplication for version control, keeping repository size manageable despite multiple ISO builds and versions. The live environment is ephemeral and stateless, booting over iPXE with embedded authentication. This became the foundation enabling the Infrastructure Lifecycle Manager to run discovery collectors and provisioning workflows without requiring pre-installed operating systems.
Built a production HashiCorp Vault plugin from scratch in Go to solve a specific infrastructure challenge: automated, ephemeral VPN access for contractors and temporary workers. I studied the Vault plugin architecture, implemented the full plugin lifecycle including path handlers, role management, and lease tracking. Integrated directly with the Netbird REST API to provision setup keys, create access groups, and define network policies. Implemented intelligent quota management that automatically cleans up inactive peers when limits are reached. This was my first significant Go project, learning the language while solving a real business problem. Deployed to production Vault clusters and used daily for access provisioning.
Led the complete security transformation for internal infrastructure, implementing single sign-on across 14+ internal services and overhauling Azure security posture. I researched, selected, and deployed the SSO solution, integrating it with Vault, Nomad, NetBox, Zabbix, Superset, and other critical services. Managed all Azure identity and access management including Just-In-Time (JIT) access provisioning for administrators. Systematically improved Azure Secure Score through policy enforcement, network segmentation, and access controls. Architected secure remote access using NetBird VPN with fine-grained network policies, eliminating the need for broad VPN access. This work reduced security incidents and provided centralized identity management across the entire infrastructure.
Authored comprehensive technical documentation for the two core backend systems (Network Boot API and Infrastructure Lifecycle Manager), creating detailed wikis that became the primary knowledge base for the infrastructure team. I wrote architecture overviews, deployment guides, troubleshooting procedures, API references, and operational runbooks from scratch. The documentation covered system architecture with diagrams, component descriptions, configuration details, integration patterns, and best practices. This was a major undertaking that required deep understanding of every system component and the ability to explain complex technical concepts clearly. The wikis enabled team members to understand and maintain these sophisticated systems, reduced onboarding time for new engineers, and served as the authoritative reference for all infrastructure operations.
Technical skills and tools used across infrastructure, security, and automation projects
Interested in discussing infrastructure architecture, automation, or potential opportunities? Let's connect.