Remotereferralboardinternaluseonly
Senior Site Reliability Engineer
engineeringfull-timeRemote
SALARY
Not listed
WORK TYPE
remote
JOB TYPE
full-time
INDUSTRY
general
✦ AutoApply Let us apply to roles like this on your behalf.
Learn more
About the role
This position
As a Senior SRE at Remote, you'll work with a high degree of autonomy on complex reliability and platform problems, owning the plan and execution of features and projects within our SRE/Platform domain. You'll contribute to the platform's architecture and reliability strategy, translating ambiguous requirements into robust, maintainable solutions and raise the technical bar of the engineers around you while collaborating closely with product and security teams in an async-first, fully remote environment.
You'll work AI-natively day to day and build reusable AI workflows that make the whole team faster and more reliable, not just yourself.
What you’ll bring
- Solid professional experience in SRE, DevOps, or Platform Engineering.
- Solid hands-on Kubernetes: operating and scaling production clusters and container tooling (Docker) and its ecosystem.
- Experience building and managing cloud infrastructure on AWS (or similar).
- Strong infrastructure-as-code practice with Terraform.
- Experience with reliability frameworks: SLOs, SLIs, error budgets, alerting strategies.
- Solid observability background: OpenTelemetry, Grafana/Prometheus or similar.
- Proficiency with CI/CD (GitLab CI, GitHub Actions, or similar) and deployment automation.
- Comfortable with Golang, Bash/scripting; broader programming a plus.
- Practical, embedded use of AI in infra/ops/dev work, agentic workflows with concrete, observable results, not just familiarity with the tools.
- Clear and thoughtful communication, especially in an async-first, global setting
- Proactive, curious, and comfortable taking ownership of challenges
- Collaborative and respectful across cultures, time zones, and backgrounds
Nice to have
- Experience with 1 back-end programming language (Elixir, Nodejs, Python, etc)
- Experience running and configuring Linux systems in a non-cloud environment
- Security knowledge and capabilities from a defensive and offensive standpoint
What you’ll do
- Lead solution discovery and delivery for reliability and infrastructure problems with real ambiguity, complexity, or scope. Autonomously, coordinating with other contributors where needed.
- Contribute to the platform's architecture, tooling, and roadmap. Influence team priorities and advocate for technical initiatives.
- Help define and operate reliability practices for our platform: SLOs/SLIs, error budgets, alerting, observability. Take responsibility for the team's operational stance, using support/
✦ Let us apply for you
We find roles like this and apply on your behalf. Cover letter written for each one. Plans from $15/mo. Cancel anytime.
Get AutoApply