Principal Engineer / Team Lead

Specializing in Infrastructure, Security & Automation

Principal engineer with expertise in building secure, scalable bare-metal and cloud platforms. Proven track record of architecting complex provisioning workflows, implementing zero-trust security, and leading technical teams to deliver resilient multi-tenant systems.

Featured Projects

Key infrastructure and automation projects demonstrating technical leadership and innovation

Bridge API - Enterprise Network Boot Orchestration

Sole developer of a high-performance async Python API spanning 45,000+ lines of production code. I architected and implemented the entire system from scratch, handling everything from the initial design to production deployment. Built the core orchestration engine that manages mTLS certificate lifecycle, coordinates iPXE network boot operations, and orchestrates distributed hardware provisioning across multiple datacenters. Integrated built-in TFTP and Samba/CIFS servers for legacy boot support and file sharing. Implemented automatic NetBird VPN connectivity with health monitoring running as background services. Designed sophisticated initrd overlay system that dynamically builds custom boot environments (discovery, live provisioning, system rescue, Ubuntu rescue) with embedded mTLS certificates, SSH keys, network configuration, and phone-home endpoints pre-configured. Personally wrote critical subsystems including the structured JSON logging framework (750 lines), SSH execution engine with connection pooling (643 lines), the dynamic Vault PKI certificate service (1,218 lines), and the initrd builder with secure credential injection. Deployed and maintained in production for 3+ years, serving as the backbone of the entire bare-metal infrastructure.

Key Achievements:

  • Architected multi-process startup orchestration separating blocking pre-flight from background tasks
  • Built dynamic initrd overlay system generating custom boot environments with embedded mTLS certs, SSH keys, and network config
  • Integrated TFTP and Samba servers as background services for legacy boot and file sharing support
  • Implemented automatic NetBird VPN connectivity with health monitoring for secure datacenter communication
  • Implemented structured JSON logging with job correlation across distributed async operations
  • Built SSH execution framework with connection pooling, exponential backoff retry, and SCP file transfers
  • Developed dynamic mTLS certificate service with on-demand generation using Vault PKI (1,218 lines)
  • Created persistent iPXE build system with multi-commit concurrent building and distributed file locking

Technologies:

PythonQuartHypercornasyncsshVault PKIiPXETFTPSambaNetBirdinitrdNetBoxNomad
Bridge API - Enterprise Network Boot Orchestration screenshot 1
PythonAPIsAsyncSecurityInfrastructure

Infrastructure Lifecycle Manager - Distributed Orchestration Platform

Architected and built a comprehensive 27,000+ line monorepo managing the complete lifecycle of rental bare-metal infrastructure and datacenter bridge nodes. I designed the entire system as a Nomad-orchestrated platform with multiple coordinated subsystems: lifecycle state machines (7,733 lines of Python), hardware collectors (8,743 lines), Terraform zone provisioning (1,145 lines), and Ansible host configuration (1,916 lines). Built 7 different Nomad job types working in concert - system jobs for continuous lifecycle management, cron jobs for synchronization, batch jobs for bootstrapping, and dispatch jobs for on-demand operations. Created the "managed bridge" architecture where Nomad orchestrates specialized datacenter nodes providing DHCP, VPN, and monitoring services. The lifecycle logic uses Nomad node metadata as a state machine, triggering different job groups based on device status. This became the central orchestration platform managing hundreds of devices across multiple datacenters.

Key Achievements:

  • Built 27K-line monorepo with 5 coordinated subsystems: lifecycle (7.7K), collectors (8.7K), Terraform (1.1K), Ansible (1.9K), Nomad jobs
  • Architected 7 Nomad job types: marketplace-host-lifecycle (522 lines), bridge-bootstrap, bridge-sync (cron), zone-service (dispatch), netbird-ssh, bridge-dispatch
  • Designed "managed bridge" architecture: Nomad-orchestrated datacenter nodes providing DHCP (Kea), VPN (NetBird), monitoring (Zabbix)
  • Implemented state machine using Nomad metadata: planned → inventory → provisioning → active → decommissioning
  • Built 20+ hardware collectors with 8,743 lines: dmidecode, lscpu, nvidia, lshw, kernel params, networking, storage
  • Created Terraform zone service (1,145 lines): provisions tenant-specific Vault secrets, NetBox IPAM, NetBird networks, Cloudflare DNS
  • Developed Ansible playbooks (1,916 lines): bootstrap bridges with Docker, NetBird, Kea DHCP, UFW firewall, MySQL replication
  • Integrated continuous sync: bridge-sync cron job (every 5 min) syncs NetBird peers with NetBox devices

Technologies:

PythonNomadTerraformAnsibleNetBoxVaultZabbixNetBirdCloudflare R2
Infrastructure Lifecycle Manager - Distributed Orchestration Platform screenshot 1
Infrastructure Lifecycle Manager - Distributed Orchestration Platform screenshot 2
PythonNomadOrchestrationTerraformAnsibleInfrastructure

Azure Serverless BMC Orchestration - Build, Scale, Migrate

Led the complete lifecycle of a sophisticated ~14K-line Azure Functions platform from initial greenfield development through production scaling to strategic architectural migration. I architected and built the original system with 8 coordinated functions using Durable Functions orchestration, Kafka event streaming, and comprehensive Redfish/IPMI hardware management. The project integrated deeply with 9 external systems (NetBox, Vault, Kafka, Redfish, IPMI, Zabbix, Nomad, Netbird, Aiven) handling device lifecycle, self-help onboarding, and health monitoring. After 2+ years of production operation and 800+ commits, I led a major architectural evolution - migrating from monolithic SharedClasses (~2K lines) to modern service-oriented architecture (9 specialized services totaling ~13K lines). Successfully executed a massive deprecation removing ~18K lines of legacy code while maintaining zero downtime. Currently guiding the frontend development team on porting remaining functions to new platforms, providing architectural documentation and migration patterns.

Key Achievements:

  • Built ~14K-line serverless platform with 8 Azure Functions and 9 domain services over 2+ years
  • Contributed 800+ commits (23% of total) as primary architect and maintainer
  • Led architectural migration: deprecated ~18K lines of legacy code, refactored to modern service architecture
  • Designed Durable Functions orchestration with HTTP and Kafka triggers for hybrid event processing
  • Implemented comprehensive Redfish/IPMI integration (~1.8K LOC) with vendor-specific handling for Dell, HP, Lenovo, SuperMicro
  • Built automated health monitoring: UnprovisionableDeviceDetector with 4-tier validation (bridge, BMC, IPMI, agent)
  • Created device lifecycle state machine with phone-home validation and cryptographic signature verification
  • Currently guiding frontend team on migration strategy and knowledge transfer for remaining components

Technologies:

PythonAzure FunctionsKafkaRedfishIPMIVaultNetBoxZabbixNetbirdDurable Functions
Azure Serverless BMC Orchestration - Build, Scale, Migrate screenshot 1
AzurePythonServerlessKafkaLeadershipMigration

Multi-Zone Security Architecture with Vault Integration

Designed and implemented a comprehensive zero-trust security architecture spanning the entire backend infrastructure. I architected multi-tenant isolation using Vault namespaces with zone-specific identity boundaries, securing every communication channel with mTLS. Built the authentication flow from end users through Azure/Kafka to backend services, implementing JWT-based claims with AppRole and entity-based authorization. Created a zone-restricted access model where each tenant/site/location has isolated PKI, secrets, and network policies. Integrated Vault OIDC for service authentication, GitLab CI/CD for automated token generation, and Nomad for workload identity. Every component - Nomad servers, marketplace hosts, bridges, and external services - communicates over mTLS with Vault-issued certificates. This architecture enabled secure multi-tenant operations while maintaining strict isolation and auditability.

Key Achievements:

  • Architected zero-trust security model with mTLS for all service-to-service communication
  • Implemented multi-tenant isolation using Vault namespaces with zone identity boundaries
  • Designed JWT-based authentication flow with Vault entity metadata and ACL policies
  • Built automated PKI certificate distribution for Nomad, bridges, and marketplace hosts
  • Created zone-restricted access patterns (tenant_id, site_id, location_id) for data isolation
  • Integrated Vault Transit/PKI/KV/Netbird secrets engines for comprehensive secrets management
  • Established secure CI/CD pipeline with Vault token generation via GitLab

Technologies:

VaultVault OIDCVault PKImTLSJWTNomadAppRoleNetBirdAzureKafka
Multi-Zone Security Architecture with Vault Integration screenshot 1
SecurityZero-TrustVaultmTLSMulti-TenantPKI

Direct Provisioning Platform - Programmatic Deployment System

Led a complete transformation of the company's bare-metal deployment strategy, migrating from slow, error-prone Ubuntu ISO installations to a sophisticated direct provisioning system using our custom live OS. I architected and implemented the entire direct provisioning workflow that bypassed traditional ISO-based installations, instead using the custom live environment to directly set up disks, partition layouts, and install operating systems programmatically. This was one of the earliest major projects I tackled and became a foundational capability that enabled everything that followed. The new approach gave us complete control over the provisioning process, allowed rapid expansion of supported OS variants, and dramatically improved deployment reliability. This architectural shift was a major company milestone that transformed how infrastructure was deployed.

Key Achievements:

  • Transformed deployment from Ubuntu ISO method to direct OS provisioning via live environment
  • Reduced provisioning time from 20+ minutes to ~5 minutes (4x improvement)
  • Expanded OS support from Ubuntu 20/22 to Ubuntu 20/22/24 + Debian 9/10
  • Implemented variant system: vanilla (stock), HPC (NVIDIA/Mellanox drivers), Proxmox
  • Built disk partitioning and filesystem setup automation with Curtin
  • Created cloud-init integration for post-install configuration
  • Eliminated error-prone manual ISO mounting and auto-install file generation
  • Established foundation for all future OS provisioning capabilities

Technologies:

PythonCurtinCloud-InitAnsibleBashLinuxUbuntuDebian
PythonInfrastructureAutomationLinuxInnovation

Public Bridge API - Lightweight Remote Datacenter Provisioning

Architected and implemented a lightweight, containerized bridge solution that enabled remote datacenter provisioning without requiring full infrastructure deployment. I designed a public-facing mTLS-secured API that IPMI controllers could connect to for iPXE booting, provisioning, and secure disk wiping. This eliminated the need to deploy the full bridge infrastructure (with dual network management) in every datacenter. Instead, a lightweight container could communicate back to the central bridge over mTLS, dramatically reducing deployment complexity while maintaining security. The architecture supported the full device lifecycle from inventory, through provisioning, to decommissioning - all remotely managed through secure API calls. This was a key innovation that made multi-datacenter expansion feasible.

Key Achievements:

  • Designed lightweight bridge architecture reducing datacenter deployment footprint by 90%
  • Implemented mTLS-secured public API for remote IPMI controller communication
  • Enabled full lifecycle management (discovery, provisioning, decommissioning) via remote API
  • Eliminated need for dual-network infrastructure in remote datacenters
  • Built containerized deployment for easy installation in distributed locations
  • Secured all communication with Vault-issued certificates and mutual TLS authentication

Technologies:

PythonmTLSiPXEDockerIPMIRedfishVault PKI
Public Bridge API - Lightweight Remote Datacenter Provisioning screenshot 1
ArchitectureSecurityAPIsDockerDistributed Systems

Infrastructure Services Platform - 14 Production Services

Owned the deployment, configuration, and day-to-day operation of 14 critical infrastructure services across dev/staging/production environments over 3+ years before transitioning to a dedicated DevOps team. I architected enterprise-grade production deployments with high availability, automated monitoring, backup strategies, SSL certificate renewal, centralized syslog forwarding, and Vault SSH certificate signing for admin access. Led the deployment strategy through multiple evolutionary phases: started with a pure Ansible monorepo, evolved to Ansible dynamically generating Terraform configurations, and ultimately migrated to static Terraform modules with Ansible roles for service configuration. Wrote comprehensive Ansible roles for each service (Vault, Nomad, NetBox, Zabbix, Redis, NetBird, Kafka, etc.) handling installation, configuration, and ongoing management. Created all Terraform modules from scratch (4,595 lines) with reusable git-sourced modules for Azure deployments. Personally handled production incidents, capacity planning, version upgrades, and security patches. This hands-on operational experience gave me deep expertise in running production infrastructure at scale.

Key Achievements:

  • Deployed 14 production services across dev/stg/prod: Vault, Nomad, NetBox, Zabbix Servers, Redis, NetBird, n8n, Superset, Gravwell, Kafka Connectors, Infrastructure Monitor, Zabbix Proxies, Nomad Workers, Public Bridge
  • Architected production deployments with HA, automated monitoring, backups, SSL auto-renewal, syslog forwarding, and Vault SSH cert signing
  • Led infrastructure evolution: pure Ansible monorepo → dynamic Terraform generation → static Terraform with Ansible roles
  • Built comprehensive Ansible roles for each service: installation, configuration, secrets management, service hardening
  • Created modular Terraform architecture with git-sourced reusable modules (4,595 lines)
  • Implemented Azure Traffic Manager for HA/LB with automatic failover
  • Created ACME certificate automation with Vault and Cloudflare DNS integration
  • Designed multi-environment promotion strategy (dev → staging → prod) with environment-specific tfvars

Technologies:

TerraformAzureAnsibleCloudflareVaultNomadZabbixKafka
Infrastructure Services Platform - 14 Production Services screenshot 1
TerraformAzureIaCInfrastructureHAAnsible

Dynamic OS Image Builder - Multi-Distribution Build System

Architected and built a comprehensive automated OS image factory using Packer, Ansible, and Python for building customized Linux distributions across multiple architectures. I designed the entire system from the build definition format to the CI/CD pipeline, creating a Python-based configuration generator that produces hundreds of Packer build variants from a single YAML definition. The system supports multiple Linux distributions (Ubuntu, Debian), versions (12, 22, 24), architectures (AMD64, ARM64), and custom variants (vanilla, HPC, Proxmox). Built parallelized GitLab CI/CD pipelines with dedicated bare-metal runners that reduced build times by 70%. Implemented intelligent artifact storage using Restic with deduplication and encryption - reducing hundreds of GB of OS image builds to less than 100GB through block-level deduplication. Stored deduplicated images in Cloudflare R2 for global edge distribution. This became the foundation for all OS provisioning across the infrastructure.

Key Achievements:

  • Built dynamic configuration generator in Python that creates Packer configs from YAML definitions
  • Implemented multi-architecture builds (AMD64, ARM64) with dedicated bare-metal runners
  • Created variant system supporting vanilla, HPC (NVIDIA/Mellanox), and Proxmox configurations
  • Reduced build times by 70% through parallelization and intelligent caching
  • Designed Restic-based storage with deduplication: reduced hundreds of GB of OS images to less than 100GB
  • Built automated checksum validation and version tracking system
  • Supported hundreds of build combinations (3 distros × 3 versions × 2 architectures × 3 variants)

Technologies:

PythonPackerAnsibleGitLab CI/CDBashResticCloudflare R2QEMU
Dynamic OS Image Builder - Multi-Distribution Build System screenshot 1
PythonCI/CDPackerBuild SystemsAutomation

Custom Live Linux Distribution for Bare-Metal Discovery

Built a custom live Linux operating system from scratch using Ansible that provides the complete runtime environment for bare-metal discovery and provisioning operations. I designed the entire build pipeline including multi-architecture support (AMD64, ARM64), automated CI/CD with GitLab runners, and distribution via R2 storage with Restic deduplication. The live OS contains all necessary packages pre-installed - NVIDIA drivers, diagnostic tools, network utilities, Nomad client, NetBird VPN, and hardware collection dependencies. Created a fully automated build system using Ansible chroot operations that generates bootable ISOs with all tools, network configuration, and security credentials baked in. Implemented Restic deduplication for version control, keeping repository size manageable despite multiple ISO builds and versions. The live environment is ephemeral and stateless, booting over iPXE with embedded authentication. This became the foundation enabling the Infrastructure Lifecycle Manager to run discovery collectors and provisioning workflows without requiring pre-installed operating systems.

Key Achievements:

  • Built custom Linux live OS from Ubuntu base with Ansible chroot automation
  • Pre-installed all runtime dependencies: NVIDIA drivers, Nomad client, NetBird VPN, diagnostic tools
  • Implemented multi-architecture builds (AMD64, ARM64) with dedicated bare-metal runners
  • Created automated ISO generation pipeline with checksum validation and versioning
  • Used Restic deduplication to keep repository size slim despite multiple ISO versions
  • Designed ephemeral, stateless environment enabling zero-touch provisioning workflows
  • Built R2 distribution system for global edge delivery of ISO images
  • Provided complete runtime environment for Infrastructure Lifecycle Manager discovery operations

Technologies:

AnsibleLinuxiPXEGitLab CI/CDBashCloudflare R2UbuntuSquashfsChrootRestic
Custom Live Linux Distribution for Bare-Metal Discovery screenshot 1
LinuxAutomationCI/CDInfrastructureBuild Systems

Custom Vault Plugin - Netbird VPN Integration

Built a production HashiCorp Vault plugin from scratch in Go to solve a specific infrastructure challenge: automated, ephemeral VPN access for contractors and temporary workers. I studied the Vault plugin architecture, implemented the full plugin lifecycle including path handlers, role management, and lease tracking. Integrated directly with the Netbird REST API to provision setup keys, create access groups, and define network policies. Implemented intelligent quota management that automatically cleans up inactive peers when limits are reached. This was my first significant Go project, learning the language while solving a real business problem. Deployed to production Vault clusters and used daily for access provisioning.

Key Achievements:

  • Architected custom Vault plugin with path handlers for config, credentials, and roles
  • Implemented quota management with automatic peer cleanup when limits exceeded
  • Built ephemeral access groups for temporary contractor access with automatic revocation
  • Designed network policy creation (TCP, UDP, ICMP) for fine-grained access control
  • Created lease management with TTL vs expires_in separation for lifecycle control
  • Integrated real Netbird REST API with comprehensive error handling

Technologies:

GoVault Plugin APINetbird APIREST
Custom Vault Plugin - Netbird VPN Integration screenshot 1
GoVaultPluginsSecurityVPN

Enterprise SSO & Azure Security Architecture

Led the complete security transformation for internal infrastructure, implementing single sign-on across 14+ internal services and overhauling Azure security posture. I researched, selected, and deployed the SSO solution, integrating it with Vault, Nomad, NetBox, Zabbix, Superset, and other critical services. Managed all Azure identity and access management including Just-In-Time (JIT) access provisioning for administrators. Systematically improved Azure Secure Score through policy enforcement, network segmentation, and access controls. Architected secure remote access using NetBird VPN with fine-grained network policies, eliminating the need for broad VPN access. This work reduced security incidents and provided centralized identity management across the entire infrastructure.

Key Achievements:

  • Implemented SSO across 14+ internal services (Vault, Nomad, NetBox, Zabbix, Superset, n8n, Gravwell)
  • Managed Azure identity and access management with Just-In-Time administrative access
  • Improved Azure Secure Score through systematic policy enforcement and security controls
  • Architected secure access patterns using NetBird VPN with network segmentation
  • Designed and enforced network policies for least-privilege access to infrastructure
  • Centralized authentication eliminating password sprawl and improving audit capabilities

Technologies:

Azure ADSSOOAuth/OIDCJIT AccessNetBirdAzure Security CenterNetwork Policies
SecurityAzureSSOIdentityZero-Trust

Comprehensive Technical Documentation - Wiki System

Authored comprehensive technical documentation for the two core backend systems (Network Boot API and Infrastructure Lifecycle Manager), creating detailed wikis that became the primary knowledge base for the infrastructure team. I wrote architecture overviews, deployment guides, troubleshooting procedures, API references, and operational runbooks from scratch. The documentation covered system architecture with diagrams, component descriptions, configuration details, integration patterns, and best practices. This was a major undertaking that required deep understanding of every system component and the ability to explain complex technical concepts clearly. The wikis enabled team members to understand and maintain these sophisticated systems, reduced onboarding time for new engineers, and served as the authoritative reference for all infrastructure operations.

Key Achievements:

  • Authored complete documentation for 45K+ line Network Boot API and 27K+ line Infrastructure Lifecycle Manager
  • Created architecture documentation with system diagrams and component relationships
  • Wrote deployment guides covering dev, staging, and production environments
  • Documented all API endpoints, authentication patterns, and integration points
  • Created troubleshooting guides based on production incident experience
  • Established documentation as single source of truth for infrastructure team
  • Reduced new engineer onboarding time through comprehensive knowledge transfer

Technologies:

MarkdownGitMermaid DiagramsTechnical Writing
DocumentationTechnical WritingKnowledge Transfer

Core Competencies

Technical skills and tools used across infrastructure, security, and automation projects

Programming & Development

PythonPowerShellBashGroovy

Async & APIs

QuartHypercornasyncioaiohttpasyncsshRESTgRPC

Infrastructure as Code

TerraformAnsiblePackerCloud-InitCurtin

Security & Secrets

Vault PKIVault TransitVault PluginsmTLSDynamic SecretsZero-Trust

Orchestration & Workloads

NomadDockerAzure FunctionsLibvirt/KVM

Observability & Monitoring

ZabbixGraylogSupersetTrinoGravwellPrometheus

Cloud Platforms

AzureAWSCloudflareAzure Traffic ManagerAzure Functions

Data & Messaging

PostgreSQLMongoDBRedisKafkaKafka ConnectasyncpgRestic

Networking & Hardware

iPXETFTPDHCP/DNSNetBoxNetBirdNetwork Boot

Hardware Management & Troubleshooting

IPMIRedfishiDRACiLOBMCServer Troubleshooting

CI/CD & DevOps

GitLab CI/CDAWXGitHub ActionsArtifact Management

Get in Touch

Interested in discussing infrastructure architecture, automation, or potential opportunities? Let's connect.