Aeratechnology
Principal Cloud Infrastructure Architect
engineeringfull-timeRemote US, USA
SALARY
Not listed
WORK TYPE
remote
JOB TYPE
full-time
INDUSTRY
ai
✦ AutoApply Let us apply to roles like this on your behalf.
Learn more
About the role
Responsibilities
- Architect and scale enterprise-grade AKS clusters built for high concurrency, performance, and real-time AI inference, ensuring the platform is globally distributed and highly available.
- Leverage Crossplane for Kubernetes-native provisioning of Azure services, creating a Kubernetes-native control plane for rapid scaling of AI services.
- Champion GitOps practices with Argo CD to standardize deployments across multiple environments and regions, enabling reliable, automated delivery of mission-critical SaaS workloads.
- Engineer infrastructure that supports data-intensive AI/ML pipelines, integrating compute, storage, and messaging with Kubernetes to power real-time decision intelligence use cases.
- Optimize scalability and concurrency with autoscaling, pod disruption budgets, and advanced workload scheduling, ensuring millions of daily requests are served with low latency.
- Develop and maintain automation, tooling, and integrations using Python, Ruby, and Terraform, enabling teams to scale infrastructure and AI services efficiently.
- Design and enforce secure, compliant, multi-tenant architectures with Azure AD SSO, managed identities, RBAC, and Key Vault integration.
- Build resilient networking topologies with VNets, VNet peering, Private Link, and service mesh technologies (e.g., Istio, Linkerd) and emissary ingress for advanced security and reliability.
- Integrate observability frameworks at scale using Prometheus, Grafana, Azure Monitor, and OpenTelemetry, providing deep visibility into performance, availability, and latency.
- Collaborate closely with AI/ML engineering teams to align infrastructure with real-time inference and streaming data requirements, enabling cutting-edge decision automation.
- Mentor engineering and operations teams while documenting and evangelizing Kubernetes-native and Azure-native best practices, driving innovation across the organization.
About You
- 10+ years of cloud infrastructure experience with expert-level skills in Kubernetes and Azure.
- Proven experience designing and operating multi-tenant SaaS platforms where performance, scalability, and security are critical.
- Hands-on expertise with Crossplane for Kubernetes-controlled Azure service provisioning.
- Deep familiarity with Azure services: AKS, Azure Flexible MySQL, Blob Storage, Event Hubs, Key Vault, etc.
- Strong coding and automation background with Github Actions, Python, and Terraform, plus experience with other high-level programming and scripting languages.
- Skilled in Infrastructure as Code (Terraform, Crossplane, Helm) and GitOps (Argo CD).
- In-depth knowledge of Kubernetes networking, autoscaling, and workload orchestration for AI/ML inference workloads.
- Proficiency with observability tooling: Prometheus, Grafana, Azure Monitor, and OpenTelemetry.
- A collaborative leader who thrives on mentoring and driving best practices across teams.
✦ Let us apply for you
We find roles like this and apply on your behalf. Cover letter written for each one. Plans from $14.99/mo. Cancel anytime.
Join waitlist