AI/ML Research Engineer, LLM Post-Training & Evaluation
About the role
Scope of the Role:
Innodata is expanding its team of technical experts in LLM training, post-training, and evaluation systems. As an AI/ML Research Engineer, LLM Training & Evaluation, you will build and optimize the technical foundations that power model improvement for foundation model builders and leading labs.
This role is ideal for someone who has hands-on experience fine-tuning and evaluating large language models (and ideally multimodal models), and who can bridge research and engineering in real-world customer environments. You will work closely with Language Data Scientists, Applied Research Scientists, data engineers, and client technical stakeholders to design and implement robust training/evaluation pipelines using both human-in-the-loop and AI-augmented methods.
The ideal candidate brings a strong computer science / machine learning engineering background, experience with modern LLM post-training workflows, and the ability to engage credibly with technical counterparts at leading AI organizations.
What You’ll Own:
As an AI/ML Research Engineer, LLM Training & Evaluation, you will design and implement the pipelines and tooling that connect data, evaluation, and post-training. You will help customers and internal teams move from evaluation findings to measurable model improvements.
Your work may include building fine-tuning workflows (e.g., supervised fine-tuning and preference-based optimization), integrating evaluation harnesses into model development loops, improving experiment reliability and throughput, and supporting advanced evaluation scenarios such as long-context, cross-modal, and dynamic multi-turn interactions.
You will also contribute to Innodata’s internal R&D efforts, including benchmark datasets, evaluation frameworks, and reusable infrastructure for model assessment and post-training experimentation. Additional responsibilities include (but are not limited to):
- Lead or co-lead technically complex ML engineering projects from initial customer discussions through implementation and delivery
- Design, build, and improve LLM training and post-training pipelines, including data ingestion, preprocessing, fine-tuning, evaluation, and experiment tracking
- Implement and optimize evaluation systems for LLMs and multimodal models, including offline benchmarks and task-specific test harnesses
- Integrate human-in-the-loop and AI-augmented evaluation signals into model development workflows
- Build robust infrastructure and tooling for reproducible experimentation, metrics logging, and regression monitoring
- Diagnose model behavior and pipeline failures, including data issues, training instability, metric inconsistencies, and evaluation drift
- Collaborate with Language Data Scientists and Applied Research Scientists to translate evaluation frameworks into executable systems
- Work closely with customer technical stakeholders to understand goals, constraints, and success criteria; propose and implement technically sound solutions
- Contribute to internal research and platform development, including benchmark frameworks, evaluation tooling, and post-training workflow improvements