Techholding
Techholding

ML / AI Data Engineer (Contract)

engineeringfull-timeIndia, Remote
SALARY
Not specified
WORK TYPE
remote
JOB TYPE
full-time
INDUSTRY
ai
Apply for this position →
✦ AutoApply — Let us apply to roles like this on your behalf.
Learn more →

About the role

About us:

Working at Tech Holding isn't just a job, it's an opportunity to be a part of something bigger. We are a full-service consulting firm that was founded on the premise of delivering predictable outcomes and high-quality solutions to our clients. Our founders and team members have industry experience and have held senior positions in a wide variety of companies – from emerging startups to large Fortune 50 firms – and we have taken our combined experiences and developed a unique approach that is supported by the principles of deep expertise, integrity, transparency, and dependability.

Key Responsibilities

  • Design, deploy, and scale large-scale ML and data processing pipelines across cloud infrastructure.
  • Build systems to ingest, process, and serve 250,000+ hours of multimodal data (video, audio, metadata).
  • Architect and optimize GPU-based compute environments (e.g., NVIDIA Tesla clusters) for distributed training and inference.
  • Develop high-throughput backend systems for video ingestion from desktop and mobile platforms.
  • Implement distributed processing workflows, including job scheduling, fault tolerance, and resource allocation.
  • Design and build human-in-the-loop and automated annotation systems to ensure data quality and scalability.
  • Translate ML and multimodal research into scalable, production-grade cloud architectures.
  • Optimize pipelines for performance, reliability, and cost efficiency across compute, storage, and networking layers.
  • Collaborate with ML, data, and engineering teams to deliver end-to-end data workflows.

Requirements

  • 5+ years of experience in data engineering, ML pipelines, or distributed systems.
  • Strong experience building scalable data pipelines for large datasets (video/audio preferred).
  • Hands-on experience with cloud platforms (AWS, Azure, or GCP).
  • Experience working with GPU-based environments and distributed computing.
  • Strong programming skills in Python, Scala, or similar languages.
  • Experience with data processing frameworks (Spark, Ray, Kafka, Airflow, or similar).
  • Understanding of ML workflows, training pipelines, and inference systems.
  • Experience designing fault-tolerant, high-availability systems.
  • Strong knowledge of data storage systems (data lakes, object storage, distributed file systems).
  • Ability to handle high-throughput, large-scale data ingestion and processing.

Good to Have

  • Experience with multimodal AI (video, audio, NLP) systems.
  • Familiarity with annotation tools and data labeling workflows.
  • Experience with containerization and orchestration (Docker, Kubernetes).
  • Knowledge of cost optimization strategies for large-scale cloud workloads.
✦ Let us apply for you
We find roles like this and apply on your behalf. Cover letter written for each one. $14.44/mo.
Start AutoApply →
Apply now →