← Back to jobs
Zscaler
Zscaler

Sr. Production Engineer

engineeringfull-timeRemote - California, USA; San Jose, California, USA
SALARY
Not listed
WORK TYPE
hybrid
JOB TYPE
full-time
INDUSTRY
ai
Apply for this position
✦ AutoApply Let us apply to roles like this on your behalf.
Learn more

About the role

Role

We are looking for a Sr. Production Engineer to join our team. This role is available as a hybrid opportunity 3 days a week in San Jose, CA or Remote reporting to Production Engineering in the Cloud Infrastructure & Operations department. Join Zscaler to be a force multiplier for the reliability of a global platform processing 200+ billion transactions daily across tens of millions of enterprise users.

In this role, you will provide the technical vision and hands-on execution to drive an "automation-first" culture across the company. By maturing our observability and architectural standards, you will directly reduce our Mean Time to Mitigate (MTTM) and shape the scalability of our globally distributed, multi-cloud infrastructure.

What you’ll do (Role Expectations)

  • Implement highly available, scalable infrastructure across AWS, GCP, and bare-metal environments
  • Drive an "automation-first" culture by writing code (Python/Go) to eliminate manual toil and build self-healing systems
  • Implement and maintain sophisticated observability (Prometheus, Grafana, OpenTelemetry), define SLIs/SLOs, and establish error budgets
  • Act as a lead Incident Commander (TDO on-call), develop response playbooks, and conduct deep-dive post-incident analyses
  • Partner with Engineering and partner teams to conduct operability reviews

Who You Are (Success Profile)

  • You act like an owner with a bias for action and integrity.
  • You are a pragmatic builder obsessed with creating, iterating, and shipping.
  • You champion simplicity by distilling complex problems into clear, actionable plans.
  • You are data-driven, valuing evidence over assumptions.
  • You think at scale, building solutions and processes built to last a high-growth global organization.

What We’re Looking for (Minimum Qualifications)

  • 3-5+ years of experience managing reliability, scalability, and availability for large-scale production services
  • Deep expertise in programming (e.g., Python, Go, or C/C++)
  • Strong background in networking protocols, Linux/RHEL systems, and distributed architecture
  • Experience in high-stakes incident management and participation in a 24/7 on-call rotation
✦ Let us apply for you
We find roles like this and apply on your behalf. Cover letter written for each one. Plans from $14.99/mo. Cancel anytime.
Join waitlist
Apply now
Sr. Production Engineer at Zscaler — Remote