← Back to jobs
Wekatest
Wekatest

Senior Team Lead - AI Inference

engineeringfull-timeU.S. Remote
SALARY
Not listed
WORK TYPE
remote
JOB TYPE
full-time
INDUSTRY
ai
Apply for this position
✦ AutoApply Let us apply to roles like this on your behalf.
Learn more

About the role

What You'll Work On

  • Lead & Own: Take end-to-end ownership of AMG's core inference infrastructure — from the NVMe Token Warehouse and GDS data paths to the vLLM/LMCache serving stack — driving technical decisions and delivery outcomes.
  • Technical Direction: Guide a team of engineers through design, implementation, and delivery of high-throughput, low-latency LLM inference systems, setting high standards for code quality, architecture, and reliability.
  • Build at Scale: Stay hands-on across the AMG stack (Python, C++, CUDA, vLLM, NIXL/Dynamo, Kubernetes), contributing directly to production systems while providing technical leadership to the team.
  • Solve Hard Problems: Tackle the real frontier challenges of inference engineering — disaggregated prefill/decode, persistent off-HBM KV caching, RDMA-based transport, and multi-tier GPU memory hierarchies — that define what's possible at scale.
  • Grow People & Teams: Mentor and coach engineers through regular 1:1s, career coaching, and sprint reviews. Foster a culture of ownership, collaboration, and technical excellence within the AMG team.
  • Stay on the Frontier: Track the evolving inference ecosystem, benchmark new tools (SGLang, TRT-LLM, NVIDIA Dynamo), and help the team make timely decisions about when to adopt, build, or pivot.

What We're Looking For

  • Experienced Engineering Leader: 5+ years of professional software engineering, with proven experience leading engineers and owning complex production systems — ideally in AI/ML infrastructure or high-performance computing.
  • Deep AI Inference Background: Hands-on expertise with LLM serving systems — KV cache reuse, disaggregated prefill/decode, continuous batching, and multi-tier GPU memory hierarchies (HBM → NVMe). Strong familiarity with vLLM, LMCache, NIXL/NVIDIA Dynamo, or similar frameworks.
  • Systems Engineering Depth: Strong Python and C++ skills (Rust a plus), with a solid grasp of CUDA, GPU memory management, and high-performance I/O — including GPUDirect Storage (GDS), RDMA, and NVMe data paths.
  • Infrastructure Fluency: Experience deploying and scaling GPU workloads on Kubernetes, with familiarity in RDMA networking, bare-metal GPU clusters (H100/A100), and high-throughput distributed storage.
  • People Leadership: Demonstrated ability to mentor and develop engineers — running effective 1:1s, supporting career growth, and balancing technical execution with long-term team health.

High Bar for Quality

A strong sense of engineering craftsmanship, with a track record of building reliable, high-throughput systems and continuously improving engineering practices.

The WEKA Way

  • We are Accountable: We take full ownership, always–even when things don’t go as planned. We lead with integrity, show up with responsibility & ownership, and hold ourselves and each other to the highest standards.
✦ Let us apply for you
We find roles like this and apply on your behalf. Cover letter written for each one. Plans from $14.99/mo. Cancel anytime.
Join waitlist
Apply now
Senior Team Lead - AI Inference at Wekatest — Remote