Senior Autonomy Data Engineer
About the role
About the Company
At Torc, we have always believed that autonomous vehicle technology will transform how we travel, move freight, and do business. A leader in autonomous driving since 2007, Torc has spent over a decade commercializing our solutions with experienced partners. A part of the Daimler family, we are focused solely on developing software for automated trucks to transform how the world moves freight. Join us and catapult your career with the company that helped pioneer autonomous technology, and the first AV software company with the vision to partner directly with a truck manufacturer.
Meet The Team
Torc is hiring a Senior Autonomy Data Engineer to design, build and operate the data infrastructure that powers our autonomy program. You will build the pipelines, storage systems, and tooling that turn raw vehicle sensor logs into the curated, structured datasets that our perception, planning and simulation engineers depend on.
This is a high-ownership role on a lean team. Moving large scale sensor data reliably from vehicles operating in demanding environments and making it quickly available for model training is a difficult and high-impact problem to solve. You will work directly with ML engineers, autonomy developers and platform engineers to close this data loop.
What You'll Do
- Data Lake and Ingestion Pipeline
- Own the design and organization of the program's data lake, including schema definitions, partitioning strategy and metadata indexing.
- Design and maintain end-to-end pipelines that ingest high-bandwidth sensor logs from vehicles into cloud storage with high reliability and tolerant of ad-hoc and intermittent connectivity mechanisms.
- Develop data validation and integrity checks that can detect corrupted information, missing sensors, and inconsistent calibration prior to the data being processed by downstream systems.
- Implement retention, tiering and lifecycle policies for data to balance storage costs with development value.
- Dataset Curation and Labeling Infrastructure
- Build tooling to query raw logs to produce curated training and evaluation datasets.
- Build automation to run cost-effective pseudo-labeling workflows at the scale of data ingest.
- Implement data quality and model performance metrics that are used to direct labeling effort toward the highest-value examples.
- Autonomy Data Visualization
- Deploy and maintain data visualization tooling to support log review, annotation QA, and autonomy debugging workflows.
- Build integrations between the visualization tooling and the data lake so engineers can navigate from a dataset entry or model failure directly to the origin log data.
- Work with autonomy engineers to define and surface custom visualization panels and implement metrics for analyzing unstructured operating environments.
- Build dashboards that provide the autonomy engineers visibility into data coverage by terrain type, operating environment and geographic region.
- Cross-functional Collaboration
- Establish and document data contracts between the data services and model training consumers.
- Partner with perception, planning and embedded engineers across the data lifecycle: from shaping the logging schemas and collection triggers to defining the dataset interfaces that supply model training and evaluation.
- Define data engineering standards, best practices, and tooling choices for an innovative and fast-paced team.
- Contribute to the data roadmap and provide input to technical leadership on investment priorities.
- Mentor junior engineers and raise the team's capabilities in data infrastructure scalability and performance.