Data Center Engineer, Resource Efficiency – Compute Supply
About the role
About Anthropic
Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.
About the Role
Anthropic's AI infrastructure operates at massive scale, and extracting maximum compute throughput from every watt is a first-order priority. As a Power & Resource Efficiency Engineer, you'll sit at the intersection of IT and facilities — building the systems, models, and control loops that optimize how we allocate and consume power, cooling, and physical capacity across our TPU/GPU fleet. You'll own the technical strategy for turning raw data center capacity into reliable, efficient compute, working across power topology, workload scheduling, and real-time telemetry to push utilization as close to the physical envelope as possible while maintaining our availability commitments.
What You'll Do
- Build models that forecast consumption across electrical and mechanical subsystems, informing capacity planning, energy procurement, oversubscription targets and risks, including statistical modeling of cluster utilization, workload profiles, and failure modes.
- Design IT/OT interfaces that bridge compute orchestration with facility controls, enabling real-time telemetry across accelerator hardware, power distribution, cooling, and schedulers.
- Build and operate load management systems that use power and cooling topology to enable load management and power/thermal-aware placement to maximize throughput while meeting SLOs.
- Partner with data center providers to drive design optimizations and hold them accountable to SLA-grade performance standards, providing technical diligence on partner architectures.
What We're Looking For
- Deep knowledge of data center power distribution and cooling architectures, and how they interact with IT load profiles. Experience with reliability engineering, SLA development, and failure-mode analysis.
- Proficiency in statistical modeling and simulation for infrastructure capacity or power utilization.
- Familiarity with SCADA/BMS/EPMS, telemetry pipelines, and control systems. Experience building software that bridges IT and OT.
- Exposure to accelerator deployments and their power management interfaces strongly preferred.
- Demand response, grid interaction, or behind-the-meter generation experience is a plus.
- Ability to translate between infrastructure engineering, software teams, and external partners.