Techholding

ML / AI Data Engineer (Contract)

engineeringfull-timeIndia, Remote

SALARY

Not specified

WORK TYPE

remote

JOB TYPE

full-time

INDUSTRY

Apply for this position →

✦ AutoApply — Let us apply to roles like this on your behalf.

Learn more →

About the role

About us:

Working at Tech Holding isn't just a job, it's an opportunity to be a part of something bigger. We are a full-service consulting firm that was founded on the premise of delivering predictable outcomes and high-quality solutions to our clients. Our founders and team members have industry experience and have held senior positions in a wide variety of companies – from emerging startups to large Fortune 50 firms – and we have taken our combined experiences and developed a unique approach that is supported by the principles of deep expertise, integrity, transparency, and dependability.

Key Responsibilities

Design, deploy, and scale large-scale ML and data processing pipelines across cloud infrastructure.
Build systems to ingest, process, and serve 250,000+ hours of multimodal data (video, audio, metadata).
Architect and optimize GPU-based compute environments (e.g., NVIDIA Tesla clusters) for distributed training and inference.
Develop high-throughput backend systems for video ingestion from desktop and mobile platforms.
Implement distributed processing workflows, including job scheduling, fault tolerance, and resource allocation.
Design and build human-in-the-loop and automated annotation systems to ensure data quality and scalability.
Translate ML and multimodal research into scalable, production-grade cloud architectures.
Optimize pipelines for performance, reliability, and cost efficiency across compute, storage, and networking layers.
Collaborate with ML, data, and engineering teams to deliver end-to-end data workflows.

Requirements

5+ years of experience in data engineering, ML pipelines, or distributed systems.
Strong experience building scalable data pipelines for large datasets (video/audio preferred).
Hands-on experience with cloud platforms (AWS, Azure, or GCP).
Experience working with GPU-based environments and distributed computing.
Strong programming skills in Python, Scala, or similar languages.
Experience with data processing frameworks (Spark, Ray, Kafka, Airflow, or similar).
Understanding of ML workflows, training pipelines, and inference systems.
Experience designing fault-tolerant, high-availability systems.
Strong knowledge of data storage systems (data lakes, object storage, distributed file systems).
Ability to handle high-throughput, large-scale data ingestion and processing.

Good to Have

Experience with multimodal AI (video, audio, NLP) systems.
Familiarity with annotation tools and data labeling workflows.
Experience with containerization and orchestration (Docker, Kubernetes).
Knowledge of cost optimization strategies for large-scale cloud workloads.

✦ Let us apply for you

We find roles like this and apply on your behalf. Cover letter written for each one. $14.44/mo.

Start AutoApply →

Apply now →