Reddit
Reddit

Senior Machine Learning Systems Engineer, Ads ML Experience Platform

engineeringfull-timeRemote - United States
SALARY
Not listed
WORK TYPE
remote
JOB TYPE
full-time
INDUSTRY
general
Apply for this position
✦ AutoApply Let us apply to roles like this on your behalf.
Learn more

About the role

Team Overview

We are building the next generation of ML research tools and agentic AI platforms that power machine learning development across Reddit. Our mission is to accelerate the Ads ML lifecycle – from experimentation and training to deployment, evaluation, and autonomous operations – through scalable platform services, intelligent automation, and developer-centric tooling.

Our team owns critical platform capabilities including offline ML experimentation systems, production training orchestration frameworks, ML lifecycle automation and, agentic ML frameworks that enable faster model iterations.

We are looking for an experienced engineer with deep expertise in large-scale distributed systems, ML platforms, and emerging agentic architectures to help define and build the foundational tooling for the next generation of our machine learning devX tooling.

What You’ll Do

  • Design and build large-scale offline ML experimentation platforms that enable reproducible research, model development, evaluation, and promotion workflows.
  • Develop production-grade training orchestration frameworks supporting distributed training, hyperparameter optimization, model evaluation, and automated retraining.
  • Build infrastructure for experiment tracking, metadata management, lineage, artifact versioning, model registries, and reproducibility.
  • Partner with ML engineers and researchers to improve experimentation velocity and operational efficiency.
  • Build automated workflows for model promotion, rollback, compliance validation, and continuous evaluation.
  • Design and build an agentic AI execution platform supporting autonomous and human-in-the-loop workflows, including multi-agent orchestration, memory/context systems, and scalable workflow infrastructure.

What You Bring

  • 5+ years in infrastructure/platform engineering or large-scale distributed systems.
  • 2+ years of hands-on experience building and operating production ML infrastructure, developer SDKs, platform APIs, or self-service AI tooling.
  • Experience building workflow orchestration systems, developer platforms, or large-scale automation frameworks.
  • Experience with distributed data processing systems such as Spark, Flink, Ray, or equivalent technologies.
  • Experience with modern orchestration and workflow technologies such as Kubeflow, Argo, Airflow, or similar frameworks.
  • Experience building offline ML experimentation platforms, model registries, experiment tracking systems, or training orchestration pipelines.
✦ Let us apply for you
We find roles like this and apply on your behalf. Cover letter written for each one. Plans from $15/mo. Cancel anytime.
Get AutoApply
Apply now
Senior Machine Learning Systems Engineer, Ads ML Experience Platform at Reddit — Remote