Platform Operations Lead
About the role
Platform Operations Lead
Location: UK (Remote)
Department: Infrastructure
Reporting to: Head of Infrastructure
ABOUT NEXGEN CLOUD
NexGen Cloud is the company behind Hyperstack, a full-stack AI cloud serving tens of thousands of customers from AI researchers to enterprises running the world's most compute-intensive workloads. We deliver on-demand and private GPU infrastructure to teams who treat performance as a requirement, not a feature.
We're a tight-knit, fast-moving team working at the cutting edge of AI cloud infrastructure. We practice what we preach, equipping our people with AI at every level so we can solve harder problems, ship faster, and keep raising the bar for what enterprise GPU infrastructure looks like.
THE ROLE: Platform Operations Lead
This role exists to help NexGen Cloud scale the operational maturity of its cloud infrastructure as demand grows across regions, customers and services.
You'll sit at the intersection of Infrastructure, DevOps, Engineering and Customer Experience, helping reduce operational load on engineering teams through automation, tooling, runbooks and clear support processes. You'll play a key role in improving reliability, observability, incident response and operational readiness across our platform.
This is a hands-on role for someone who enjoys building practical solutions, improving how teams work, and taking ownership of operational outcomes in a fast-moving cloud environment.
WHAT YOU'LL BE DOING
Rather than a long checklist, here's what success in this role looks like:
- Build and improve scalable infrastructure operations processes that support a growing cloud platform
- Enable customer-facing and operational teams with secure automation, diagnostics, tooling and clear workflows
- Reduce repeatable manual work by identifying operational pain points and turning them into automated or self-service solutions
- Support the rollout and readiness of new infrastructure environments, working closely with Infrastructure, DevOps and Engineering teams
- Improve observability, incident response and operational documentation across production environments
- Design and maintain runbooks, escalation paths and ownership models between technical and customer-facing teams
- Evaluate new tools, vendors or approaches that could improve operational efficiency, reliability or scale