← Back to jobsApply for this position
Cresta
Senior Machine Learning Engineer
engineeringfull-timeUnited States (Remote)
SALARY
Not listed
WORK TYPE
remote
JOB TYPE
full-time
INDUSTRY
ai
✦ AutoApply Let us apply to roles like this on your behalf.
Learn more
About the role
About the role:
Machine Learning Engineers at Cresta work across several high-impact AI initiatives. Final team placement is determined based on experience, strengths, and business needs.
Current focus areas include:
- Agentic Assist: Lead and build next-generation agentic AI systems that augment contact center agents in real time. This track requires strong pre-LLM ML foundations, deep expertise in LLMs and modern prompting techniques, a rapid prototyping mindset, and a proven ability to translate cutting-edge research into scalable, production-grade systems.
- Agent & System Quality: Design evaluation frameworks and improve the reliability, robustness, and performance of LLM-powered agents. This includes diagnosing and mitigating failure modes such as hallucinations, retrieval errors, tool misuse, context drift, prompt brittleness, and multi-step reasoning breakdowns, while defining measurable quality metrics (e.g., accuracy, faithfulness, task completion, latency, and cost) for complex, non-deterministic systems.
- Insights: Architect and scale LLM and retrieval-augmented generation pipelines that ground models in enterprise data. This track focuses on building high-performance ML systems that process complex data, extract structured insights, and deliver real-time, actionable intelligence at scale.
Responsibilities:
- Lead the design and development of Cresta’s next-generation AI Agents and Agentic Assist systems, defining system architecture and core modeling approaches.
- Architect intelligent, multi-step agent workflows that combine real-time guidance, knowledge retrieval, reasoning, summarization, and automated actions into cohesive production systems.
- Design, deploy, and optimize LLM-powered systems, including Retrieval-Augmented Generation (RAG) pipelines, multi-agent orchestration, and domain-adapted models.
- Improve reasoning, planning, and tool-use capabilities in real-world AI applications.
- Develop evaluation strategies for complex, non-deterministic systems, including offline benchmarking, online experimentation, and LLM-as-a-judge methodologies.
- Diagnose and mitigate real-world failure modes such as hallucinations, retrieval errors, tool misuse, prompt brittleness, and multi-step reasoning breakdowns.
- Define and measure quality metrics (e.g., accuracy, faithfulness, task completion, latency, cost, robustness) to improve system reliability and performance.
- Optimize
✦ Let us apply for you
We find roles like this and apply on your behalf. Cover letter written for each one. Plans from $14.99/mo. Cancel anytime.
Join waitlist