Technical Program Manager, Data Centers
About the role
About Nebius
Nebius is leading a new era in cloud infrastructure for the global AI economy. We are building a full-stack AI cloud platform that supports developers and enterprises from data and model training through to production deployment, without the cost and complexity of building large in-house AI/ML infrastructure.
The Role
We are looking for a Technical Program Manager to own the operational readiness and ongoing health of our fleet of Data Centers, both COLO and BTS sites. In this role you will be the single point of accountability for ensuring each site runs as expected — SLAs met, maintenance executed on schedule, and audits passed — across a growing portfolio of landlord-operated and purpose-built facilities. You will operate as the primary interface between Nebius and our data center landlords and operators, and you will partner closely with the Nebius IT team to translate site-level operations into reliable infrastructure for our customers.
Key job responsibilities
- Own the operational health of Nebius COLO and BTS sites, ensuring each facility runs to expectation across power, cooling, space, connectivity, security, and environmental controls.
- Track, monitor, and enforce SLA compliance across landlords and colocation providers; identify breaches, drive remediation, and hold providers accountable to contractual commitments.
- Manage and coordinate site maintenance schedules — preventive and corrective — including planning and approving maintenance windows, reviewing Methods of Procedure (MOPs), and minimizing risk to live workloads.
- Plan and drive site audits covering compliance, capacity, power/cooling performance, physical security, and safety; track findings to closure.
- Serve as the primary day-to-day interface with data center landlords and operators, managing the operational relationship, escalations, and coordination of on-site activity.
- Partner closely with the Nebius IT team on deployments, capacity planning, incident response, and change management at each site.
- Build reporting mechanisms and dashboards that give leadership clear visibility into site health, SLA performance, maintenance status, and open risk across the portfolio.
- Lead incident coordination and post-incident follow-up, including root cause analysis and corrective action tracking with landlords and internal teams.
- Track and manage contractual operational obligations, deliverables, and timelines across multiple sites and providers simultaneously.
About the team
The Data Center team is responsible for the physical infrastructure that underpins Nebius' AI cloud. We manage the full lifecycle of our COLO and BTS footprint — from bringing new capacity online to keeping live sites running reliably at scale. We work at the intersection of facilities operations, vendor management, and IT infrastructure, and we move fast because our customers' AI workloads depend on the reliability we deliver.
Basic qualifications
- 10+ years of experience in technical program management, data center operations