NetSRE
About the role
About Nebius
Nebius is leading a new era in cloud infrastructure for the global AI economy. We are building a full-stack AI cloud platform that supports developers and enterprises from data and model training through to production deployment, without the cost and complexity of building large in-house AI/ML infrastructure.
Built by engineers, for engineers. From large-scale GPU orchestration to inference optimization, we own the hard problems across compute, storage, networking and applied AI.
Role
Nebius is looking for a Site Reliability Engineer in Hardware Infrastructure team.
Hardware Infrastructure team designs, develops and supports systems involved in the data-centers lifecycle:
- Serving functional and load testing system.
- Monitoring of engineering equipment located in our data centers (power supply, air and water cooling, etc.)
- Monitoring of IT equipment: racks, servers, JBODs, JBOGs, power shelves, network devices, etc.
- Asset tracking.
- Hardware repairs tasks tracking.
- Server production.
In this position, your responsibility will be to
- Ensure fault-tolerance, scale and uninterrupted operations for our services.
- Use cutting-edge technology to solve a variety of infrastructure problems.
- Implement and improve CI/CD processes.
We expect you to have
- Proficiency in Linux systems, with expertise in Python and Bash scripting for automation.
- Demonstrated ability to troubleshoot complex system issues, including hardware, software and networking problems.
- Strong analytical and problem-solving skills, with a focus on optimizing system performance.
- Working proficiency in English.
It would be an added bonus if you had
- Desire to be involved in backend development.
- Experience designing, developing and running high-load distributed systems.
Benefits & Perks
- Competitive compensation
- Career growth and learning opportunities
- Flexibility and work-life balance
- Collaborative and innovative culture
- Opportunity to work on impactful AI projects
- International environment and talented teams
What's it like to work at Nebius
Fast moving - Bold thinking - Constant growth - Meaningful impact - Trust and real ownership - Opportunity to shape the future of AI