DevOps Engineer, Cloud Platform
About the role
About Upstart
At Upstart, we're united by a mission that matters: to radically reduce the cost and complexity of borrowing for all Americans. Every day, we bring creativity, experimentation, and advanced AI to reshape access to credit, helping millions move forward financially with clarity and confidence.
As the leading AI lending marketplace, we partner with banks and credit unions to expand access to affordable credit through technology that's both radically intelligent and deeply human. Our platform runs over one million predictions per borrower using more than 1,800 signals, powering smarter, fairer decisions for millions of customers. But the numbers only hint at the impact. Every idea, every voice, and every contribution moves us closer to a world where credit never stands between people and their financial progress.
We're proudly digital-first, giving most Upstarters the flexibility to do their best work from wherever they thrive, alongside teammates across 80+ cities in the US and Canada. Digital-first doesn't mean distant. We're intentional about in-person connection through team onsites, planning sessions, and moments that spark creativity and trust. And whether you choose to work primarily from home or collaborate in-person from one of our offices in Columbus, Austin, the Bay Area, or New York City (opening Summer 2026), you'll have the support to work in the way that works best for you.
The Team
Upstart's Cloud Platform team sits within the Reliability organization and is responsible for building and operating the shared cloud infrastructure that powers all product and machine learning workloads. The team owns core platform components across Kubernetes (EKS), AWS infrastructure, service mesh, identity, and developer tooling, enabling reliability, scalability, and security across the business.
As a DevOps Engineer (L4) at Upstart, you will help evolve this platform to support increasing scale and complexity. You'll partner closely with SRE, Delivery, InfoSec, and Product/ML teams to improve reliability, developer experience, and cost efficiency across a platform used by nearly every engineering team.
How you'll make an impact
- Design and operate a fleet of Kubernetes (EKS) clusters across production, staging, and ephemeral environments, ensuring reliability and high availability
- Evolve AWS infrastructure and network architecture (VPCs, subnets, IAM, account structure) to support scalable, multi-team workloads
- Build and maintain infrastructure-as-code and GitOps workflows using tools such as Terraform, CDK, and ArgoCD
- Improve platform reliability and performance by defining and driving SLOs, analyzing incidents, and implementing systemic fixes
- Participate in and help improve the on-call rotation, leading incident response and post-incident reviews to drive systemic platform improvements
- Partner with SRE, Delivery, InfoSec, and product/ML teams to land high-impact infrastructure changes and platform standards
- Drive improvements in developer experience by simplifying platform usage, reducing toil, and enabling faster product and ML development
- Contribute to cost efficiency initiatives by optimizing resource utilization across Kubernetes and cloud infrastructure
Minimum Qualifications
- Bachelor's degree in Computer Science, Engineering, Mathematics, or a related field (or equivalent practical experience) and 3+ years of professional experience
- 3+ years of experience operating Kubernetes in production environments, including cluster networking, storage, and RBAC
- Proficiency with AWS