Mercury
Senior Software Engineer - Mercury Command
engineeringfull-timeSan Francisco, CA, New York, NY, Portland, OR, or Remote within Canada or United States
SALARY
Not listed
WORK TYPE
remote
JOB TYPE
full-time
INDUSTRY
fintech
✦ AutoApply Let us apply to roles like this on your behalf.
Learn more
About the role
What you'll do
Ship new capabilities users love:
- Design and ship new Command skills, the domain-specific instruction sets that teach the model how to handle workflows like sending money, managing invoices, and understanding cash flow
- Design and build agentic workflows in Command, defining the architecture for how multi-step agent interactions should work as we extend what the product can do on a customer's behalf
- Work with backend teams to define tool schemas for new capabilities, shaping the data contracts between Mercury's business logic and the model
- Own new capabilities end to end, from the system prompt to the frontend component that renders the response
Own the LLM layer:
- Maintain and evolve Command's prompt architecture: the system prompt, skill loading system, session context, and the policy and compliance layers underneath
- Tune model behavior: reasoning effort, prompt caching strategy, fallback chains, and the streaming patterns that make the product feel fast
- Stay current with how models are evolving and bring that knowledge back to how Command is built
Build quality in:
- Write and expand Command's eval harness, adding cases that cover new capabilities and scoring rubrics that detect regressions before users do
- Partner with product and compliance teams to define what "working correctly" means for each new capability, then build the tests that prove it
- Own the reliability and quality of what you ship, from initial design through post-launch monitoring
The ideal candidate
- Has 7 or more years of software engineering experience, with deep technical expertise building and scaling LLM-powered applications in production
- Has gone beyond shipping a first version: you have scaled an LLM-powered product, dealt with the reliability and performance problems that come with real usage, and made it better over time
- Has experience designing agentic systems and has opinions about how to architect multi-step workflows that are reliable, explainable, and safe to run on behalf of real users
- Has built eval infrastructure and can write cases that actually measure whether the product works, not just whether the model outputs something plausible
- Understands the real tradeoffs in LLM deployments: latency, cost, compliance, and what breaks in production that doesn't show up in demos
- Has opinions about what makes an AI product trustworthy, not just impressive, and can build toward that bar
- Is comfortable with TypeScript and willing to learn Haskell for
✦ Let us apply for you
We find roles like this and apply on your behalf. Cover letter written for each one. Plans from $15/mo. Cancel anytime.
Get AutoApply