Senior Machine Learning Engineer, AI Platform
About the role
Why Mozilla?
Mozilla Corporation is the non-profit-backed technology company that has shaped the internet for the better over the last 25 years. We make pioneering brands like Firefox, the privacy-minded web browser, and Pocket, a service for keeping up with the best content online. Now, with more than 225 million people around the world using our products each month, we’re shaping the next 25 years of technology and helping to reclaim an internet built for people, not companies. Our work focuses on diverse areas including AI, social media, security and more. And we’re doing this while never losing our focus on our core mission – to make the internet better for people.
The Mozilla Corporation is wholly owned by the non-profit 501(c) Mozilla Foundation. This means we aren’t beholden to any shareholders — only to our mission. Along with thousands of volunteer contributors and collaborators all over the world, Mozillians design, build and distribute open-source software that enables people to enjoy the internet on their terms.
About this team and role:
The AI Platform team is responsible for building the foundational infrastructure that powers intelligent experiences across Mozilla products. This includes model training pipelines, high-throughput inference services, GPU orchestration, and secure, privacy-respecting AI systems that operate reliably at global scale.
We’re looking for a Machine Learning Engineer with a strong platform mindset to help design, build, and operate Mozilla’s AI platform. In this role, you’ll work at the intersection of machine learning, distributed systems, and production infrastructure—ensuring that models can be trained, deployed, and served efficiently, securely, and at scale. You will collaborate closely with product, infrastructure, and security teams to enable fast iteration while meeting strict performance and privacy requirements.
What You’ll Do:
- Design, build, and operate core AI platform components used to train, deploy, and serve machine learning models in production environments.
- Own model serving and inference workflows end-to-end, driving improvements in reliability, scalability, performance, and operational excellence.
- Lead efforts to optimize inference systems for throughput, latency, and cost efficiency across CPU and GPU workloads.
- Design and manage GPU-based inference and training workloads, including performance tuning, capacity planning, and resource utilization optimization.
- Own and improve critical parts of the model lifecycle, including packaging, versioning, testing strategies, validation, and deployment automation.
- Implement and evolve observability practices (metrics, logging, tracing, alerting) to improve visibility and operational resilience of ML services and pipelines.
- Partner closely with product, infrastructure, security, and data teams to design scalable platform capabilities that enable AI-powered features.
- Contribute to technical design discussions, propose architectural improvements, and drive engineering best practices across the AI platform.