Site Reliability Engineer, Environment Automation
About the role
An overview of this role
You'll join the Dedicated team as a Site Reliability Engineer focused on Environment Automation, where your work will help power hundreds of isolated GitLab environments for our customers. In this role, you'll help keep these environments reliable, scalable, secure, and consistent by treating everything as code and contributing to automation across the entire lifecycle, from initial provisioning to day-to-day operations. Instead of operating a single platform, you'll collaborate with senior SREs to solve the unique challenges of managing many tenant environments in parallel, each with its own constraints and integration points.
You'll help define, deploy, and maintain GitLab environments across cloud providers using infrastructure as code, deployment packages, and Kubernetes. You'll contribute to automation that reduces manual work, assist in building tooling that orchestrates upgrades and configuration changes safely at scale, and support an observability stack that lets us understand and improve the health of every environment. Your work will directly impact how customers experience GitLab Dedicated and other managed offerings, enabling them to focus on building software while we ensure their GitLab environments are always production ready.
Some examples of work you'll do:
- Contribute to the design and evolution of infrastructure automation using Terraform, Ansible, and Kubernetes to provision, upgrade, and operate many GitLab environments with minimal manual effort
- Help debug and resolve production issues across Kubernetes clusters, GitLab components, and cloud services, then assist in building automation and safeguards that prevent similar issues from recurring
- Assist in creating and maintaining deployment and orchestration tools, such as Helm Charts, omnibus-gitlab configurations, and multi-tenant workflows, that make it easy for teams to manage GitLab environments at scale
What you'll do
- Contribute to automating operational tasks across many GitLab environments, from initial provisioning and configuration updates to upgrades and routine maintenance, helping reduce manual work and improve reliability at scale under the guidance of senior team members.
- Help build and refine the observability stack for multi-tenant GitLab environments so we monitor the right signals across Kubernetes, cloud services, and GitLab applications, supporting early issue detectio