Senior Software Engineer, Cloud Development
About the role
Why Mozilla?
Mozilla Corporation is the non-profit-backed technology company that has shaped the internet for the better over the last 25 years. We make pioneering brands like Firefox, the privacy-minded web browser, and Pocket, a service for keeping up with the best content online. Now, with more than 225 million people around the world using our products each month, we’re shaping the next 25 years of technology and helping to reclaim an internet built for people, not companies. Our work focuses on diverse areas including AI, social media, security and more. And we’re doing this while never losing our focus on our core mission – to make the internet better for people.
The Mozilla Corporation is wholly owned by the non-profit 501(c) Mozilla Foundation. This means we aren’t beholden to any shareholders — only to our mission. Along with thousands of volunteer contributors and collaborators all over the world, Mozillians design, build and distribute open-source software that enables people to enjoy the internet on their terms.
About the Team & Role
The AI Platform team is responsible for building the foundational infrastructure that powers intelligent experiences across Mozilla products. This includes model training pipelines, high-throughput inference services, GPU orchestration, and secure, privacy-respecting AI systems that operate reliably at global scale.
We’re looking for a Senior Software Engineer with a strong platform mindset to help design, build, and operate Mozilla’s AI platform. In this role, you’ll work at the intersection of machine learning, distributed systems, and production infrastructure—ensuring that models can be trained, deployed, and served efficiently, securely, and at scale. You will collaborate closely with product, infrastructure, and security teams to enable fast iteration while meeting strict performance and privacy requirements.
What You’ll Do
- Design, build, and operate core platform services and APIs used to deploy and serve production workloads at scale.
- Own service reliability end-to-end, driving improvements in availability, scalability, performance, and operational excellence.
- Lead efforts to optimize backend services for throughput, latency, and cost efficiency across distributed infrastructure.
- Design and manage Kubernetes-based workloads, including GitOps deployment pipelines, environment configuration, and resource utilization optimization.
- Own and improve critical parts of the service lifecycle, including packaging, versioning, testing strategies, validation, and deployment automation.
- Implement and evolve observability practices (metrics, logging, tracing, alerting) to improve visibility and operational resilience of backend services and pipelines.
- Partner closely with product, infrastructure, security, and data teams to design scalable platform capabilities that enable new product features.
- Contribute to technical design discussions, propose architectural improvements, and mentor junior engineers.