Sayari
Senior Data Engineer
engineeringfull-timeRemote - US
SALARY
$140k – $160k/yr
WORK TYPE
remote
JOB TYPE
full-time
INDUSTRY
ai
✦ AutoApply Let us apply to roles like this on your behalf.
Learn more
About the role
POSITION DESCRIPTION
As a Data Engineer at Sayari, you will be the engine behind the world’s most comprehensive commercial world model. You will join a high-autonomy team responsible for building and scaling the complex orchestration systems that transform billions of primary-source records into actionable intelligence. This is a role for a "builder" who respects the complexity of large-scale ETL and graph databases and is "PhD-curious" about the future of AI-native data products and modern orchestration.
JOB RESPONSIBILITIES
- Design, build, and maintain scalable data pipelines using Python, Spark, and Airflow to support our core data acquisition and entity resolution engines.
- Collaborate cross-functionally with AI/ML and Product teams to implement new features and AI-native products.
- Proactively identify and resolve bottlenecks in our complex ETL processes, bringing a fresh perspective to refine and optimize our existing codebase.
- Contribute to a robust engineering culture through rigorous code reviews, unit testing, and clear communication of design decisions.
- Own the end-to-end delivery of roadmap tasks within two-week sprints, ensuring work meets high standards for quality, documentation, and performance.
- Participate in roadmap planning and story refinement, eventually taking ownership of major epics that drive our long-term product defensibility.
SKILLS & EXPERIENCE
Required
- 5 or more years of production data engineering experience, with clear ownership of systems you built and operated end to end
- Strong Python, with meaningful experience in a JVM language (Scala preferred) or willingness to ramp quickly
- Hands-on Snowflake experience, or equivalent depth in BigQuery or Redshift with demonstrated ability to transfer
- Experience deploying and operating AI or ML applications in production, including output validation, monitoring, and cost management at scale
- Orchestration experience with Apache Airflow or a comparable workflow tool
- Track record of operating production systems reliably, with comfort navigating failure, monitoring, and recovery
Preferred
- Experience with Spark on Dataproc Serverless or other serverless Spark environments
- Familiarity with Kubernetes for deployment
- Experience with data quality tooling such as deequ, Great Expectations, or equivalent
- GCP experience (BigQuery, Dataproc, Cloud Storage)
- Experience leading or contributing to a data warehouse migration
- Background in team mergers or migrating a team onto a new operating process
✦ Let us apply for you
We find roles like this and apply on your behalf. Cover letter written for each one. Plans from $15/mo. Cancel anytime.
Get AutoApply