Sayari

Senior Data Engineer

engineeringfull-timeRemote - US

SALARY

$140k – $160k/yr

WORK TYPE

remote

JOB TYPE

full-time

INDUSTRY

Apply for this position

✦ AutoApply Let us apply to roles like this on your behalf.

Learn more

About the role

POSITION DESCRIPTION

As a Data Engineer at Sayari, you will be the engine behind the world’s most comprehensive commercial world model. You will join a high-autonomy team responsible for building and scaling the complex orchestration systems that transform billions of primary-source records into actionable intelligence. This is a role for a "builder" who respects the complexity of large-scale ETL and graph databases and is "PhD-curious" about the future of AI-native data products and modern orchestration.

JOB RESPONSIBILITIES

Design, build, and maintain scalable data pipelines using Python, Spark, and Airflow to support our core data acquisition and entity resolution engines.
Collaborate cross-functionally with AI/ML and Product teams to implement new features and AI-native products.
Proactively identify and resolve bottlenecks in our complex ETL processes, bringing a fresh perspective to refine and optimize our existing codebase.
Contribute to a robust engineering culture through rigorous code reviews, unit testing, and clear communication of design decisions.
Own the end-to-end delivery of roadmap tasks within two-week sprints, ensuring work meets high standards for quality, documentation, and performance.
Participate in roadmap planning and story refinement, eventually taking ownership of major epics that drive our long-term product defensibility.

SKILLS & EXPERIENCE

Required

5 or more years of production data engineering experience, with clear ownership of systems you built and operated end to end
Strong Python, with meaningful experience in a JVM language (Scala preferred) or willingness to ramp quickly
Hands-on Snowflake experience, or equivalent depth in BigQuery or Redshift with demonstrated ability to transfer
Experience deploying and operating AI or ML applications in production, including output validation, monitoring, and cost management at scale
Orchestration experience with Apache Airflow or a comparable workflow tool
Track record of operating production systems reliably, with comfort navigating failure, monitoring, and recovery

Preferred

Experience with Spark on Dataproc Serverless or other serverless Spark environments
Familiarity with Kubernetes for deployment
Experience with data quality tooling such as deequ, Great Expectations, or equivalent
GCP experience (BigQuery, Dataproc, Cloud Storage)
Experience leading or contributing to a data warehouse migration
Background in team mergers or migrating a team onto a new operating process

✦ Let us apply for you

We find roles like this and apply on your behalf. Cover letter written for each one. Plans from $15/mo. Cancel anytime.

Get AutoApply

Apply now