← Back to jobs
Lightningai
Lightningai

Infrastructure Engineer (Observability)

engineeringfull-timeNew York, New York, United States; Remote; San Francisco, California, United States; Seattle, Washington, United States
SALARY
Not listed
WORK TYPE
remote
JOB TYPE
full-time
INDUSTRY
ai
Apply for this position
✦ AutoApply Let us apply to roles like this on your behalf.
Learn more

About the role

What You’ll Do

Observability Platform & Productization

  • Own and evolve a scalable observability platform spanning metrics, logs, traces, and events
  • Drive the productization of observability capabilities for both internal teams and external customers
  • Design multi-tenant observability systems with scoped access, RBAC, and customer-facing visibility
  • Continuously improve observability systems to keep pace with rapid infrastructure buildouts

Telemetry & Data Pipelines

  • Design and operate telemetry pipelines ingesting data from GPUs, CPUs, networking (Ethernet & InfiniBand), containers, APIs, and BMC/Redfish
  • Build systems to correlate signals across infrastructure layers to enable faster debugging and root cause analysis
  • Implement streaming and real-time data pipelines using tools such as Kafka, OTEL, Promtail, or similar

Alerting, Reliability & Insights

  • Design and implement noise-resistant alerting systems to improve signal quality and reduce operational load
  • Create dashboards and alerting for InfraOps, Engineering, and Customer Success teams
  • Build automated insights and enable proactive detection, forecasting, and system health visibility at scale

Systems & Infrastructure Engineering

  • Contribute to broader infrastructure engineering projects beyond observability
  • Partner with infrastructure and platform teams to embed observability into core systems and workflows
  • Support large-scale, distributed systems across compute, networking, and storage environments

Cross-Functional Collaboration

  • Work closely with customer-facing teams to deliver external observability experiences
  • Collaborate with engineering, operations, and support teams to improve system transparency and reliability
✦ Let us apply for you
We find roles like this and apply on your behalf. Cover letter written for each one. Plans from $14.99/mo. Cancel anytime.
Join waitlist
Apply now