Staff Site Reliability Engineer, Production Engineering
About the role
Role Description
As a Site Reliability Engineer focused on company-wide reliability strategy, you will play a crucial role in advancing Dropbox’s stability, observability, incident response, and operational excellence as AI technologies reshape how software is built and operated. You will help define the reliability strategy for a new chapter of agentic development and AI-enabled software delivery, including preparing Dropbox for increases in pull request volume, service complexity, incident patterns, and demand for debugging and monitoring tools. You will partner across Engineering, Product, and leadership teams to raise the bar for reliability, guide long-term platform investments, and ensure Dropbox continues to deliver dependable experiences for millions of users.
Responsibilities
- Define and evolve Dropbox’s company-wide technical reliability strategy to support the changing engineering environment created by AI-assisted and agentic software development.
- Set multi-year reliability goals, standards, and roadmaps across observability, debugging, incident management, service health, and operational readiness.
- Lead cross-team initiatives that reduce reliability risk as software delivery velocity, pull request volume, service complexity, and incident volume increase.
- Partner with engineering leaders and platform teams to improve monitoring, alerting, debugging, SLOs, SLAs, and incident response systems at company scale.
- Identify emerging reliability risks introduced by AI-enabled development workflows and design scalable systems, processes, and guardrails to mitigate them.
- Provide technical leadership and mentorship to drive reliability best practices across the organization.