Upstart

Principal Site Reliability Engineer

engineeringfull-timeUnited States | Remote

SALARY

Not listed

WORK TYPE

remote

JOB TYPE

full-time

INDUSTRY

fintech

Apply for this position

✦ AutoApply Let us apply to roles like this on your behalf.

Learn more

About the role

About Upstart

At Upstart, we’re united by a mission that matters: to radically reduce the cost and complexity of borrowing for all Americans. Every day, we bring creativity, experimentation, and advanced AI to reshape access to credit, helping millions move forward financially with clarity and confidence.

As the leading AI lending marketplace, we partner with banks and credit unions to expand access to affordable credit through technology that’s both radically intelligent and deeply human. Our platform runs over one million predictions per borrower using more than 1,800 signals, powering smarter, fairer decisions for millions of customers. But the numbers only hint at the impact. Every idea, every voice, and every contribution moves us closer to a world where credit never stands between people and their financial progress.

The Team

Upstart’s Site Reliability Engineering (SRE) team owns the reliability, resiliency, and observability of Upstart’s production systems. We build automation, tooling, and frameworks to ensure our infrastructure is healthy, scalable, and able to support a seamless experience for both engineers and customers. Our scope includes defining Upstart’s technology operations risk strategy, implementing disaster recovery planning, and setting company-wide reliability standards.

As a Principal Engineer on the SRE team at Upstart, you will serve as a thought leader and SRE evangelist - driving adoption of best practices, mentoring engineers across the organization, and influencing both technical and business decisions. Your impact will extend beyond SRE into cross-functional collaboration with Product Engineering, DevEx, Development Productivity (Quality), DevOps, Data Engineering, and Machine Learning teams to elevate operational excellence across the company.

How you’ll make an impact

Lead the definition, advocacy, and adoption of SRE principles across engineering teams
Partner with leadership to shape long-term reliability, resiliency, and observability strategies
Champion distributed tracing, real user monitoring (RUM), and key performance metrics such as Largest Contentful Paint (LCP) to improve system visibility and user experience
Build and scale self-healing systems to minimize manual intervention and reduce downtime
Drive enterprise-wide improvements to incident response processes, including those related to Machine Learning systems
Collaborate closely with Development Productivity and Quality teams to improve engineering velocity without sacrificing reliability
Influence technical and operational roadmaps through data-driven insights and hands-on technical contributions
Own and deliver cross-functional initiatives from concept through execution, applying program management skills to align stakeholders and achieve results

Minimum Qualifications

Bachelor’s degree in Computer Science, Engineering, or Mathematics, or a related field

✦ Let us apply for you

We find roles like this and apply on your behalf. Cover letter written for each one. Plans from $14.99/mo. Cancel anytime.

Join waitlist

Apply now