← Back to jobs
Nebius
Nebius

ML Infrastructure Engineer

engineeringfull-timeAmsterdam, Netherlands; Remote - Europe; Remote - United States
SALARY
Not listed
WORK TYPE
remote
JOB TYPE
full-time
INDUSTRY
ai
Apply for this position
✦ AutoApply Let us apply to roles like this on your behalf.
Learn more

About the role

About Nebius

Nebius is leading a new era in cloud infrastructure for the global AI economy. We are building a full-stack AI cloud platform that supports developers and enterprises from data and model training through to production deployment, without the cost and complexity of building large in-house AI/ML infrastructure.

Built by engineers, for engineers. From large-scale GPU orchestration to inference optimization, we own the hard problems across compute, storage, networking and applied AI.

Listed on Nasdaq (NBIS) and headquartered in Amsterdam, we have a global footprint with R&D hubs across Europe, the UK, North America and Israel. Our team of 1,500+ includes hundreds of engineers with deep expertise across hardware, software and AI R&D.


The role

We are seeking a highly skilled ML/AI Engineer to join our team to lead and support benchmarking of GPU platforms for machine learning and AI workloads. You will play a critical role in evaluating the performance of GPU-based hardware for various deep learning and AI frameworks, enabling data-driven decisions for platform optimisation and next-generation hardware development.

Your responsibilities will include:

  • Work closely with hardware, development teams to profile and analyse GPU performance at the system and kernel level.
  • Evaluate and compare GPU performance across different platforms, architectures, and software stacks (e.g., CUDA, ROCm).
  • Debug and optimise ML workloads to run efficiently on GPU hardware, identifying and resolving performance bottlenecks.
  • Perform acceptance testing for new GPU clusters, ensuring hardware and software meet performance, stability, and compatibility requirements for AI workloads.
  • Perform experiments across diverse GPU system configurations to assess the impact of varying interconnect strategies and system-level optimisations on performance and scalability.
  • Develop tools and dashboards to visualise performance metrics, bottlenecks, and trends.
  • Contribute to internal tooling, frameworks, and best practices.

We expect you to have:

  • A profound understanding of theoretical foundations of machine learning.
  • Deep understanding of performance aspects of large neural networks training and inference (data/tensor/context/expert parallelism, offloading, custom kernels, hardware features, attention optimisations, dynamic batching etc.).
  • Deep experience with modern deep learning frameworks (PyTorch, JAX, Megatron-LM, TensorRT-LLM).
  • Good understanding of the GPU stack: CUDA, NCCL, drivers, and relevant libraries.
  • Familiarity with containerized environments (e.g., Docker, Kubernetes).
  • Strong communication and ability to work independently.

Ways to stand out from the crowd:

  • Familiarity with modern LLM inference frameworks (vLLM, SGLang, TensorRT).
  • Experience in Python and performance profiling.
✦ Let us apply for you
We find roles like this and apply on your behalf. Cover letter written for each one. Plans from $14.99/mo. Cancel anytime.
Join waitlist
Apply now