← Back to jobs
Nebius
Nebius

Staff Software Engineer in Hardware Infrastructure Observability

engineeringfull-timeAmsterdam, Netherlands; Remote - Europe
SALARY
Not listed
WORK TYPE
hybrid
JOB TYPE
full-time
INDUSTRY
ai
Apply for this position
✦ AutoApply Let us apply to roles like this on your behalf.
Learn more

About the role

The team

The Hardware Automation team builds the internal platforms and tooling that power how Nebius operates its data center infrastructure at scale. Our mission is to eliminate manual effort, reduce human error, and give every team in the Hardware Infrastructure department real-time visibility and control over the systems they own. We operate as a product engineering team embedded within hardware infrastructure — meaning we don't just write requirements and hand them off. We own the full stack: from requirements gathering with data center operations and hardware engineering, through design and build, all the way to rollout and ongoing reliability.

The Role

Nebius is looking for a Senior Software Engineer to join the Hardware Infrastructure Observability team. You're welcome to work from our office in Amsterdam. We build and run low-level monitoring for servers and data center engineering systems to ensure reliability at scale. We also design and operate maintenance and remediation systems that enable safe, predictable fleet-wide changes and keep the infrastructure healthy.

Key Responsibilities

  • Design and develop services and agents that provide deep visibility into a large server fleet and DC engineering systems
  • Evolve our metrics/aggregation/alerting pipelines and improve signals quality
  • Build maintenance workflows and automation that keep fleets healthy
  • Investigate incidents hands-on (including on-host debugging) and drive root-cause fixes
  • Collaborate with hardware, networking, and DC operations to improve reliability

We expect you to have

  • 5+ years of professional software engineering experience
  • Excellent knowledge of Python and Golang or you are ready to quickly switch to these programming languages
  • Strong Linux fundamentals
  • Ability to write reliable code and dig into complex problems
  • Working proficiency in English

It will be an added bonus if you have

  • Solid understanding of modern server architecture and its components
  • Experience with metrics/monitoring/alerting Prometheus-compatible stacks (like VictoriaMetrics)
  • Good knowledge of computer networks
  • Experience designing, developing, and running high-load distributed systems

We expect Staff Engineers to

  • Manage large-scale projects involving multiple stakeholders
  • Break down complex tasks and guide both their own work and that of more junior colleagues
  • Be experts in specific technologies and write high-quality code that can serve as a reference
  • Assess task priority and focus on high-impact work, avoiding low-value efforts
  • Have strong architectural thinking and contribute
✦ Let us apply for you
We find roles like this and apply on your behalf. Cover letter written for each one. Plans from $14.99/mo. Cancel anytime.
Join waitlist
Apply now