← Back to jobs
Nebius
Nebius

Senior Support Engineer

supportfull-timeRemote
SALARY
Not listed
WORK TYPE
remote
JOB TYPE
full-time
INDUSTRY
ai
Apply for this position
✦ AutoApply Let us apply to roles like this on your behalf.
Learn more

About the role

The role

We're looking for a senior support engineer who can handle difficult technical issues in modern cloud environments. This is not a traditional support role. The work is hands-on and technical: debugging Linux and Kubernetes issues, investigating problems in cloud infrastructure, and helping customers running AI workloads, distributed systems, and GPU-based environments. You'll work closely with engineering on production issues, help improve internal tools and troubleshooting workflows, and act as an escalation point when problems are unclear or high impact.

The role includes weekend rotation and incident response.

What you'll do

  • Investigate and resolve complex technical issues in customer environments
  • Troubleshoot across Linux, Kubernetes, cloud infrastructure, networking, storage, and GPU-related workloads
  • Support customers running containerized systems, inference workloads, training jobs, or other distributed platforms
  • Act as a senior escalation point for production incidents
  • Reproduce issues, narrow down root causes, and work with engineering on long-term fixes
  • Build or improve internal scripts, troubleshooting tools, and operational documentation
  • Help make support more scalable through better automation, observability, and process improvements
  • Communicate clearly with customers during active investigations and incidents
  • Take part in weekend coverage and urgent issue response

What we're looking for

  • Strong Linux troubleshooting skills
  • Strong Kubernetes and container experience
  • Solid understanding of cloud infrastructure in AWS, GCP, Azure, OpenStack, or similar environments
  • Good networking fundamentals
  • Ability to write scripts or small tools in Python, Bash, Go, or similar
  • Experience working on production issues that require structured debugging and cross-team collaboration
  • Ability to work independently and stay effective when the path to resolution is not obvious
  • Clear written communication, especially when explaining technical issues to customers and internal teams

Especially valuable

  • Experience with GPU-based infrastructure
  • Familiarity with AI/ML or LLM-related workloads
  • Understanding of inference and training pipelines
  • Experience improving observability, tooling, or operational workflows
  • History of building useful internal tools or automating repetitive work
  • Personal or open-source projects that show real technical depth

A strong candidate for this role usually

  • enjoys debugging messy infrastructure problems
  • takes ownership without waiting to be told exactly what to do
  • thinks beyond the immediate ticket
  • works well with engineering
  • looks for ways to reduce repeated operational pain, not just close cases
✦ Let us apply for you
We find roles like this and apply on your behalf. Cover letter written for each one. Plans from $14.99/mo. Cancel anytime.
Join waitlist
Apply now