Nebius

Staff Software Engineer in Hardware Infrastructure Observability

engineeringfull-timeAmsterdam, Netherlands; Remote - Europe

SALARY

Not listed

WORK TYPE

hybrid

JOB TYPE

full-time

INDUSTRY

Apply for this position

✦ AutoApply Let us apply to roles like this on your behalf.

Learn more

About the role

The team

The Hardware Automation team builds the internal platforms and tooling that power how Nebius operates its data center infrastructure at scale. Our mission is to eliminate manual effort, reduce human error, and give every team in the Hardware Infrastructure department real-time visibility and control over the systems they own. We operate as a product engineering team embedded within hardware infrastructure — meaning we don't just write requirements and hand them off. We own the full stack: from requirements gathering with data center operations and hardware engineering, through design and build, all the way to rollout and ongoing reliability.

The Role

Nebius is looking for a Senior Software Engineer to join the Hardware Infrastructure Observability team. You're welcome to work from our office in Amsterdam. We build and run low-level monitoring for servers and data center engineering systems to ensure reliability at scale. We also design and operate maintenance and remediation systems that enable safe, predictable fleet-wide changes and keep the infrastructure healthy.

Key Responsibilities

Design and develop services and agents that provide deep visibility into a large server fleet and DC engineering systems
Evolve our metrics/aggregation/alerting pipelines and improve signals quality
Build maintenance workflows and automation that keep fleets healthy
Investigate incidents hands-on (including on-host debugging) and drive root-cause fixes
Collaborate with hardware, networking, and DC operations to improve reliability

We expect you to have

5+ years of professional software engineering experience
Excellent knowledge of Python and Golang or you are ready to quickly switch to these programming languages
Strong Linux fundamentals
Ability to write reliable code and dig into complex problems
Working proficiency in English

It will be an added bonus if you have

Solid understanding of modern server architecture and its components
Experience with metrics/monitoring/alerting Prometheus-compatible stacks (like VictoriaMetrics)
Good knowledge of computer networks
Experience designing, developing, and running high-load distributed systems

We expect Staff Engineers to

Manage large-scale projects involving multiple stakeholders
Break down complex tasks and guide both their own work and that of more junior colleagues
Be experts in specific technologies and write high-quality code that can serve as a reference
Assess task priority and focus on high-impact work, avoiding low-value efforts
Have strong architectural thinking and contribute

✦ Let us apply for you

We find roles like this and apply on your behalf. Cover letter written for each one. Plans from $15/mo. Cancel anytime.

Get AutoApply

Apply now