← Back to jobs
Lightningai
Lightningai

Infrastructure Engineer (Storage)

engineeringfull-timeNew York, New York, United States; Remote; San Francisco, California, United States; Seattle, Washington, United States
SALARY
Not listed
WORK TYPE
remote
JOB TYPE
full-time
INDUSTRY
ai
Apply for this position
✦ AutoApply Let us apply to roles like this on your behalf.
Learn more

About the role

What We Are Looking For

Lightning AI is seeking a Storage Infrastructure Engineer to join our Infrastructure Engineering team.

In this role, you will focus on building and operating the storage systems that power large-scale AI/ML training, inference, and HPC workloads. You will work at the intersection of software, hardware, and operations—developing automation, improving reliability, and scaling distributed storage systems across our bare-metal infrastructure.

You will help own the data plane of our storage infrastructure, supporting high-throughput, low-latency data access for some of the most demanding AI workloads. You’ll play a key role in managing and evolving our storage stack (including VAST and S3-compatible systems like Ceph), ensuring performance, reliability, and efficiency at scale.

What You'll Do

Storage Systems & Infrastructure

  • Operate and scale distributed storage systems, including VAST and S3-compatible object storage (e.g., Ceph)
  • Improve performance, reliability, and efficiency of storage systems supporting large-scale AI/ML workloads
  • Troubleshoot complex storage and data path issues across hardware and software layers
  • Optimize storage performance to support high-throughput, low-latency AI training and inference workloads

Automation & Tooling

  • Build and maintain automation for provisioning, managing, and monitoring storage infrastructure
  • Develop Python-based tools and workflows to reduce manual operational overhead
  • Improve lifecycle management of storage clusters, from deployment through maintenance and scaling

Systems & Operations

  • Manage and operate Linux-based systems in production, including bare-metal environments
  • Partner with infrastructure and data center teams on hardware bring-up, upgrades, and issue resolution
  • Support capacity planning, utilization tracking, and forecasting for storage systems
  • Leverage monitoring and telemetry to diagnose issues and improve system performance and reliability

Cross-Functional Collaboration

  • Work closely with Infrastructure Engineering, Network Engineering, and Platform teams to integrate storage into the broader platform
  • Contribute to design discussions around new infrastructure deployments and scaling strategies
  • Help define best practices for operating storage systems in high-performance computing environments

What You'll Need

Required Qualifications

  • 5+ years
✦ Let us apply for you
We find roles like this and apply on your behalf. Cover letter written for each one. Plans from $14.99/mo. Cancel anytime.
Join waitlist
Apply now
Infrastructure Engineer (Storage) at Lightningai — Remote