Senior Software Engineer, Infrastructure Platform
About the role
About the Role
As a Senior Software Engineer, Infrastructure Platform, you will be a key member of Afresh’s Infrastructure engineering team. You will build and improve the infrastructure and tooling that helps our service-owning teams ship reliably, operate safely, and move quickly.
On this team, you will:
- Own and deliver infrastructure projects end-to-end, from problem definition and technical design through implementation, rollout, and iteration
- Build and improve platform primitives that make it easier for service teams to deploy, operate, and debug their services
- Improve observability and operational readiness so we can detect issues early, reduce time-to-recovery, and prevent repeat incidents
- Identify and implement cost and performance improvements across our cloud infrastructure and developer tooling
- Work closely with Security to implement practical security controls and protect sensitive data (for example, least-privilege access, secret management, and network controls)
- Participate in our on-call rotation and continuously improve monitoring and alerting to maintain a low page rate
- Stay current on infrastructure best practices and evaluate improvements with a pragmatic, impact-focused mindset
Tech Stack: Our infra is built on Azure (AKS, ACA, Entra, blob storage), with Terraform for IaC. Backend services are primarily Python and run on Kubernetes. Observability is Datadog + Sentry, and we use GitHub Actions for CI. Familiarity with Kubernetes and Terraform is the strongest signal for this role; comfort with Python is expected. Many of our problems are stack-agnostic platform problems.
Skills + Experience
We recognize strong engineers come from many backgrounds and advance at different paces.
For this Senior role, candidates typically have 5+ years of relevant software engineering experience (or equivalent experience).
We’re looking for someone who has repeatedly delivered complex technical work in production environments, can turn ambiguous problems into a plan, and can execute with a high level of ownership and good judgment.
Technical Skills
- Cloud infrastructure — You have operated and maintained mission-critical cloud infrastructure with high uptime. You can design and implement scalable infrastructure (Azure preferred, but AWS/GCP are also fine), including core cloud networking (VPC/VNet design, routing, DNS, load balancing, and connectivity), and you can build improvements that make it easier for service owners to manage their own systems.
- Incident response / disaster recovery — You have led or played a key role in high-severity production incidents. You can troubleshoot complex issues, restore service, and communicate effectively.