Nebius

ML Infrastructure Engineer

engineeringfull-timeAmsterdam, Netherlands; Remote - Europe; Remote - United States

SALARY

Not listed

WORK TYPE

remote

JOB TYPE

full-time

INDUSTRY

Apply for this position

✦ AutoApply Let us apply to roles like this on your behalf.

Learn more

About the role

About Nebius

Nebius is leading a new era in cloud infrastructure for the global AI economy. We are building a full-stack AI cloud platform that supports developers and enterprises from data and model training through to production deployment, without the cost and complexity of building large in-house AI/ML infrastructure.

Built by engineers, for engineers. From large-scale GPU orchestration to inference optimization, we own the hard problems across compute, storage, networking and applied AI.

Listed on Nasdaq (NBIS) and headquartered in Amsterdam, we have a global footprint with R&D hubs across Europe, the UK, North America and Israel. Our team of 1,500+ includes hundreds of engineers with deep expertise across hardware, software and AI R&D.

The role

We are seeking a highly skilled ML/AI Engineer to join our team to lead and support benchmarking of GPU platforms for machine learning and AI workloads. You will play a critical role in evaluating the performance of GPU-based hardware for various deep learning and AI frameworks, enabling data-driven decisions for platform optimisation and next-generation hardware development.

Your responsibilities will include:

Work closely with hardware, development teams to profile and analyse GPU performance at the system and kernel level.
Evaluate and compare GPU performance across different platforms, architectures, and software stacks (e.g., CUDA, ROCm).
Debug and optimise ML workloads to run efficiently on GPU hardware, identifying and resolving performance bottlenecks.
Perform acceptance testing for new GPU clusters, ensuring hardware and software meet performance, stability, and compatibility requirements for AI workloads.
Perform experiments across diverse GPU system configurations to assess the impact of varying interconnect strategies and system-level optimisations on performance and scalability.
Develop tools and dashboards to visualise performance metrics, bottlenecks, and trends.
Contribute to internal tooling, frameworks, and best practices.

We expect you to have:

A profound understanding of theoretical foundations of machine learning.
Deep understanding of performance aspects of large neural networks training and inference (data/tensor/context/expert parallelism, offloading, custom kernels, hardware features, attention optimisations, dynamic batching etc.).
Deep experience with modern deep learning frameworks (PyTorch, JAX, Megatron-LM, TensorRT-LLM).
Good understanding of the GPU stack: CUDA, NCCL, drivers, and relevant libraries.
Familiarity with containerized environments (e.g., Docker, Kubernetes).
Strong communication and ability to work independently.

Ways to stand out from the crowd:

Familiarity with modern LLM inference frameworks (vLLM, SGLang, TensorRT).
Experience in Python and performance profiling.

✦ Let us apply for you

We find roles like this and apply on your behalf. Cover letter written for each one. Plans from $15/mo. Cancel anytime.

Get AutoApply

Apply now