← Back to jobsApply for this position
Innodatainc
Events & Community Growth Intern
datafull-timeRemote - Washington
SALARY
Not listed
WORK TYPE
remote
JOB TYPE
full-time
INDUSTRY
ai
✦ AutoApply Let us apply to roles like this on your behalf.
Learn more
About the role
Scope of the Role:
We are looking for a curious and driven Data Engineering Intern to join our Data & AI team. You will primarily focus on building and maintaining robust data pipelines and infrastructure, while also contributing to applied AI projects involving Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems.
This is a hands-on role. You will work alongside senior engineers and data scientists, contribute to production-grade systems.
The role is roughly 65% Data Engineering and 35% Data Science / Applied AI.
What You’ll Own:
Data Engineering
- Design, build, and maintain scalable ETL/ELT data pipelines using tools like Apache Airflow, dbt, or Spark
- Work with structured and unstructured data from various sources — APIs, databases, event streams
- Write optimized SQL queries and data transformation logic for analytical and ML use cases
- Maintain and improve data quality, schema management, and pipeline monitoring
- Collaborate on data warehouse and data lake architecture (e.g., Snowflake, BigQuery, Delta Lake)
- Document data flows, lineage, and schema definitions
Data Science & Applied AI
- Build and evaluate RAG pipelines — chunking, embedding, indexing, and retrieval
- Work with vector databases (e.g., Pinecone, Weaviate, pgvector) for semantic search
- Integrate LLM APIs (OpenAI, Anthropic, open-source models) into data products or internal tools
- Help with prompt engineering, evaluation frameworks, and fine-tuning experiments
- Support exploratory data analysis and feature engineering for ML workflows
You’ll Thrive in This Role If You Have:
- Pursuing a degree in Computer Science, Data Science, Engineering, or a related field
- Solid foundation in Python — comfortable writing clean, modular, production-quality code
- Hands-on experience with SQL (query optimization, CTEs, window functions)
- Familiarity with at least one cloud platform — AWS, GCP, or Azure
- Understanding of data pipeline concepts: batch vs streaming, orchestration, idempotency
- Strong analytical mindset with attention to data quality and correctness
- Experience with workflow orchestrators: Apache Airflow, Prefect, or Dagster
- Exposure to dbt for data transformation and testing
The expected hourly range for this position is $20/hour.
✦ Let us apply for you
We find roles like this and apply on your behalf. Cover letter written for each one. Plans from $14.99/mo. Cancel anytime.
Join waitlist