← Back to jobsApply for this position
Flipapp1
Senior Site Reliability Engineer (m/f/d)
engineeringfull-timeBerlin, Berlin, Germany; Remote (Europe); Stuttgart
SALARY
Not listed
WORK TYPE
remote
JOB TYPE
full-time
INDUSTRY
ai
✦ AutoApply Let us apply to roles like this on your behalf.
Learn more
About the role
Job Teaser
As a Senior Site Reliability Engineer in our Platform Squad, you'll own critical reliability domains end-to-end and drive the technical direction within the squad - leading architectural decisions on our platform, mentoring teammates, and continuously raising the reliability bar inside the team.
This role is for an engineer with a proven track record of building and operating high-throughput, highly available systems, who wants senior-level technical ownership and real impact through deep engineering work inside a tight, well-scoped team.
What awaits you with us
- Co-own the architecture: Help drive the architecture and evolution of our cloud infrastructure on Azure and our Kubernetes clusters - designed for high throughput and highest availability - to support Flip's rapid growth across the globe.
- Drive the resilience strategy: Define how we approach global scaling, zero-downtime deployments, rollback mechanisms and disaster recovery, and make sure the platform stays available around the clock.
- Evolve our observability stack: Improve our LGTM stack (Loki, Grafana, Tempo, Mimir) into a foundation our engineers can trust.
- Improve our IaC Platform: Eliminate toil at the source, and make our infrastructure truly self-service for engineering teams.
- Lead in incidents: Take a leading role in platform-related major incidents, drive blameless post-mortems for the squad, and translate findings into systemic improvements.
- Mentor within the squad: Coach teammates, run RFCs and design reviews inside the team, and help engineers grow into stronger SREs.
- Shape our roadmap: Partner with your squad to define the platform's direction.
What you bring to the table
We're looking for a hands-on, SaaS-minded senior Site Reliability Engineer who treats scalability and reliability as a first-class product concern.
Must-Have Qualifications
- 5+ years of hands-on experience as a Site Reliability Engineer (SRE), Platform Engineer, DevOps Engineer, Infrastructure Engineer, Cloud Engineer, or Backend Engineer with a strong infrastructure focus.
- Proven track record building and operating high-throughput, highly available systems in production.
- Deep, production-level experience with Kubernetes on any Hyperscaler.
- Strong experience with modern observability stacks (e.g. Prometheus, Mimir, VictoriaMetrics, Dash0, Loki, ELK) and a clear point of view on SLIs, SLOs and error budgets.
- Solid software development skills in Go (strongly preferred, since our IaC runs on Pulumi in Go) or Python.
- Hands-on experience with Infrastructure as Code (Pulumi, OpenTofu, Terraform) and GitOps (e.g. ArgoCD) + CI/CD pipeline design.
- Demonstrated ability to lead complex infrastructure initiatives from design to production - including writing RFCs and driving architecture decisions within your team.
- Experience mentoring engineers and raising the technical bar within a team.
✦ Let us apply for you
We find roles like this and apply on your behalf. Cover letter written for each one. Plans from $14.99/mo. Cancel anytime.
Join waitlist