Software Engineer - Data Movement Platform
About the role
About Reddit
Reddit is a community of communities. It’s built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet. Every day, Reddit users submit, vote, and comment on the topics they care most about. With 100,000+ active communities and approximately 126 million daily active unique visitors, Reddit is one of the internet’s largest sources of information.
Reddit has a flexible workforce! If you happen to live close to one of our physical office locations our doors are open for you to come into the office as often as you'd like. Don't live near one of our offices? No worries: You can apply to work remotely in any country in which we have a physical presence.
About the Role
The Data Movement team is looking to hire a Software Engineer who is excited to solve large scale data platform and efficiency challenges.
Our community of users generates over 100B events per day, each of which is ingested into a data warehouse that sees 55,000+ daily queries. We utilize this data to enable both batch and streaming based ML and BI workloads at the company. Critical teams such as ads, feed generation, and ML experimentation rely on the Data Platform to generate revenue for Reddit. The Data Movement team within this group is specifically responsible for empowering the business to more efficiently process data and orchestrate these data workloads.
As a software engineer, you will work with teammates and partner teams to create and improve scalable, fault tolerant, self-serve systems. You will also:
- Refine and maintain our data infrastructure technologies to support ML and analytics workflows on data collected from hundreds of millions of users.
- Own the Data Movement Platform used to enable batch and stream data processing at Reddit.
- Invest in building new infrastructure for Spark, Flink, and Airflow technologies at Reddit, including contributing to the open source community as needed.
- Build automated solutions to minimize toilsome work for data users at Reddit and provide a declarative, self-service experience for working with data.
- Collaborate with teammates to share on-call responsibilities and support monitoring/alerting to improve the reliability, scalability, latency, and efficiency of Reddit’s Data Platform.
If you have a passion for building and maintaining high quality code, want to improve how Reddit makes strategic decisions at the company level, and are excited about applying engineering best practices to one of the most powerful corpus of data in the world, then this is the team for you!
Who you might be
- 2+ years of software engineering experience in a production setting writing clean, maintainable, and well-tested code
- Proficient in object-oriented programming languages like