Techholding
ML / AI Data Engineer (Contract)
engineeringfull-timeIndia, Remote
SALARY
Not specified
WORK TYPE
remote
JOB TYPE
full-time
INDUSTRY
ai
✦ AutoApply — Let us apply to roles like this on your behalf.
Learn more →
About the role
About us:
Working at Tech Holding isn't just a job, it's an opportunity to be a part of something bigger. We are a full-service consulting firm that was founded on the premise of delivering predictable outcomes and high-quality solutions to our clients. Our founders and team members have industry experience and have held senior positions in a wide variety of companies – from emerging startups to large Fortune 50 firms – and we have taken our combined experiences and developed a unique approach that is supported by the principles of deep expertise, integrity, transparency, and dependability.
Key Responsibilities
- Design, deploy, and scale large-scale ML and data processing pipelines across cloud infrastructure.
- Build systems to ingest, process, and serve 250,000+ hours of multimodal data (video, audio, metadata).
- Architect and optimize GPU-based compute environments (e.g., NVIDIA Tesla clusters) for distributed training and inference.
- Develop high-throughput backend systems for video ingestion from desktop and mobile platforms.
- Implement distributed processing workflows, including job scheduling, fault tolerance, and resource allocation.
- Design and build human-in-the-loop and automated annotation systems to ensure data quality and scalability.
- Translate ML and multimodal research into scalable, production-grade cloud architectures.
- Optimize pipelines for performance, reliability, and cost efficiency across compute, storage, and networking layers.
- Collaborate with ML, data, and engineering teams to deliver end-to-end data workflows.
Requirements
- 5+ years of experience in data engineering, ML pipelines, or distributed systems.
- Strong experience building scalable data pipelines for large datasets (video/audio preferred).
- Hands-on experience with cloud platforms (AWS, Azure, or GCP).
- Experience working with GPU-based environments and distributed computing.
- Strong programming skills in Python, Scala, or similar languages.
- Experience with data processing frameworks (Spark, Ray, Kafka, Airflow, or similar).
- Understanding of ML workflows, training pipelines, and inference systems.
- Experience designing fault-tolerant, high-availability systems.
- Strong knowledge of data storage systems (data lakes, object storage, distributed file systems).
- Ability to handle high-throughput, large-scale data ingestion and processing.
Good to Have
- Experience with multimodal AI (video, audio, NLP) systems.
- Familiarity with annotation tools and data labeling workflows.
- Experience with containerization and orchestration (Docker, Kubernetes).
- Knowledge of cost optimization strategies for large-scale cloud workloads.
✦ Let us apply for you
We find roles like this and apply on your behalf. Cover letter written for each one. $14.44/mo.
Start AutoApply →