Innodatainc

Events & Community Growth Intern

datafull-timeRemote - Washington

SALARY

Not listed

WORK TYPE

remote

JOB TYPE

full-time

INDUSTRY

Apply for this position

✦ AutoApply Let us apply to roles like this on your behalf.

Learn more

About the role

Scope of the Role:

We are looking for a curious and driven Data Engineering Intern to join our Data & AI team. You will primarily focus on building and maintaining robust data pipelines and infrastructure, while also contributing to applied AI projects involving Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems.

This is a hands-on role. You will work alongside senior engineers and data scientists, contribute to production-grade systems.

The role is roughly 65% Data Engineering and 35% Data Science / Applied AI.

What You’ll Own:

Data Engineering

Design, build, and maintain scalable ETL/ELT data pipelines using tools like Apache Airflow, dbt, or Spark
Work with structured and unstructured data from various sources — APIs, databases, event streams
Write optimized SQL queries and data transformation logic for analytical and ML use cases
Maintain and improve data quality, schema management, and pipeline monitoring
Collaborate on data warehouse and data lake architecture (e.g., Snowflake, BigQuery, Delta Lake)
Document data flows, lineage, and schema definitions

Data Science & Applied AI

Build and evaluate RAG pipelines — chunking, embedding, indexing, and retrieval
Work with vector databases (e.g., Pinecone, Weaviate, pgvector) for semantic search
Integrate LLM APIs (OpenAI, Anthropic, open-source models) into data products or internal tools
Help with prompt engineering, evaluation frameworks, and fine-tuning experiments
Support exploratory data analysis and feature engineering for ML workflows

You’ll Thrive in This Role If You Have:

Pursuing a degree in Computer Science, Data Science, Engineering, or a related field
Solid foundation in Python — comfortable writing clean, modular, production-quality code
Hands-on experience with SQL (query optimization, CTEs, window functions)
Familiarity with at least one cloud platform — AWS, GCP, or Azure
Understanding of data pipeline concepts: batch vs streaming, orchestration, idempotency
Strong analytical mindset with attention to data quality and correctness
Experience with workflow orchestrators: Apache Airflow, Prefect, or Dagster
Exposure to dbt for data transformation and testing

The expected hourly range for this position is $20/hour.

✦ Let us apply for you

We find roles like this and apply on your behalf. Cover letter written for each one. Plans from $14.99/mo. Cancel anytime.

Join waitlist

Apply now