Simscale

Senior SRE / Platform Engineer (m/f/d)

engineeringfull-timeMunich, Germany and Remote

SALARY

Not listed

WORK TYPE

remote

JOB TYPE

full-time

INDUSTRY

general

Apply for this position

✦ AutoApply Let us apply to roles like this on your behalf.

Learn more

About the role

The Role

We are looking for a Senior SRE / Platform Engineer (m/f/d) to own and improve the cloud infrastructure behind SimScale's browser-based simulation platform. The role spans AWS and EKS, observability, disaster recovery, security and compliance controls, multi-region architecture, elastic GPU/HPC capacity, and internal developer tooling.

SimScale's engineering teams run workloads directly on AWS; you will build the standards, guardrails, and self-service tooling that let them do so safely, raising reliability and security without slowing engineering velocity. You will join a small, tightly knit infrastructure team supporting 50+ engineers across the company. This is a hands-on senior individual contributor role; people management is not required, but there is a genuine path toward tech-lead ownership as the team grows.

Your Opportunity

Evolve our Kubernetes platform: Evaluate and adopt technologies such as Kubernetes Gateway API and service mesh patterns, and coordinate platform evolution across 10+ engineering teams.
Take observability to the next level: Drive organization-wide adoption of OpenTelemetry for distributed tracing and metrics, and help teams define meaningful SLOs.
Shape multi-region architecture and data residency: Support our move from an EU-centered footprint toward a global, multi-cloud architecture that satisfies disaster-recovery and data-residency requirements.
Own cloud cost and efficiency at scale: Keep petabyte-scale infrastructure cost-efficient, secure, and well-instrumented.
Improve tooling: Build self-service AWS account provisioning, guardrails and AI-assisted automations that help engineering teams manage infrastructure safely and efficiently at scale.

What We Expect from You

5+ years of professional experience in SRE, platform, or infrastructure engineering.
Software development experience: Your background is rooted in software development, and you moved into SRE from there. You write production-quality software in at least one of Python, Go, Rust, or Java.
Strong systems foundation: You understand Linux internals and distributed systems well enough to debug complex production behavior.
Hands-on cloud and infrastructure experience: AWS (or GCP), declarative infrastructure (Terraform), gitops-workflow (ArgoCD) and container orchestration (Kubernetes).
Observability and reliability experience: You have worked with OpenTelemetry, Prometheus, distributed tracing, monitoring, and meaningful SLOs/SLIs.
Production debugging depth: You can investigate complex failures, communicate clearly during incidents, and turn findings into durable improvements.
Security and compliance awareness: You understand how infrastructure decisions affect access control, auditability, disaster recovery, logging, and standards such as SOC 2.
Clear communication: You can explain trade-offs to engineering teams and help others adopt better platform practices without unnecessary friction.

Bonus Points

An open source portfolio or contributions.
Prior technical leadership experience, especially in infrastructure, reliability, or platform engineering.

Location: Remote (within CET ±5h)

What you can expect from us

Join a dedicated, supportive team with unlimited growth opportunities and leadership potential
Make an impact quickly by sharing ideas and contributing to creative, goal-oriented projects
Work in a diverse, inclusive environment with colleagues from over 35 countries
Enjoy flexible working hours and remote work options

✦ Let us apply for you

We find roles like this and apply on your behalf. Cover letter written for each one. Plans from $15/mo. Cancel anytime.

Get AutoApply

Apply now