Wekatest

Senior Team Lead - AI Inference

engineeringfull-timeU.S. Remote

SALARY

Not listed

WORK TYPE

remote

JOB TYPE

full-time

INDUSTRY

Apply for this position

✦ AutoApply Let us apply to roles like this on your behalf.

Learn more

About the role

What You'll Work On

Lead & Own: Take end-to-end ownership of AMG's core inference infrastructure — from the NVMe Token Warehouse and GDS data paths to the vLLM/LMCache serving stack — driving technical decisions and delivery outcomes.
Technical Direction: Guide a team of engineers through design, implementation, and delivery of high-throughput, low-latency LLM inference systems, setting high standards for code quality, architecture, and reliability.
Build at Scale: Stay hands-on across the AMG stack (Python, C++, CUDA, vLLM, NIXL/Dynamo, Kubernetes), contributing directly to production systems while providing technical leadership to the team.
Solve Hard Problems: Tackle the real frontier challenges of inference engineering — disaggregated prefill/decode, persistent off-HBM KV caching, RDMA-based transport, and multi-tier GPU memory hierarchies — that define what's possible at scale.
Grow People & Teams: Mentor and coach engineers through regular 1:1s, career coaching, and sprint reviews. Foster a culture of ownership, collaboration, and technical excellence within the AMG team.
Stay on the Frontier: Track the evolving inference ecosystem, benchmark new tools (SGLang, TRT-LLM, NVIDIA Dynamo), and help the team make timely decisions about when to adopt, build, or pivot.

What We're Looking For

Experienced Engineering Leader: 5+ years of professional software engineering, with proven experience leading engineers and owning complex production systems — ideally in AI/ML infrastructure or high-performance computing.
Deep AI Inference Background: Hands-on expertise with LLM serving systems — KV cache reuse, disaggregated prefill/decode, continuous batching, and multi-tier GPU memory hierarchies (HBM → NVMe). Strong familiarity with vLLM, LMCache, NIXL/NVIDIA Dynamo, or similar frameworks.
Systems Engineering Depth: Strong Python and C++ skills (Rust a plus), with a solid grasp of CUDA, GPU memory management, and high-performance I/O — including GPUDirect Storage (GDS), RDMA, and NVMe data paths.
Infrastructure Fluency: Experience deploying and scaling GPU workloads on Kubernetes, with familiarity in RDMA networking, bare-metal GPU clusters (H100/A100), and high-throughput distributed storage.
People Leadership: Demonstrated ability to mentor and develop engineers — running effective 1:1s, supporting career growth, and balancing technical execution with long-term team health.

High Bar for Quality

A strong sense of engineering craftsmanship, with a track record of building reliable, high-throughput systems and continuously improving engineering practices.

The WEKA Way

We are Accountable: We take full ownership, always–even when things don’t go as planned. We lead with integrity, show up with responsibility & ownership, and hold ourselves and each other to the highest standards.

✦ Let us apply for you

We find roles like this and apply on your behalf. Cover letter written for each one. Plans from $15/mo. Cancel anytime.

Get AutoApply

Apply now