← Back to jobsApply for this position
Lightningai
Infrastructure Engineer (Observability)
engineeringfull-timeNew York, New York, United States; Remote; San Francisco, California, United States; Seattle, Washington, United States
SALARY
Not listed
WORK TYPE
remote
JOB TYPE
full-time
INDUSTRY
ai
✦ AutoApply Let us apply to roles like this on your behalf.
Learn more
About the role
What You’ll Do
Observability Platform & Productization
- Own and evolve a scalable observability platform spanning metrics, logs, traces, and events
- Drive the productization of observability capabilities for both internal teams and external customers
- Design multi-tenant observability systems with scoped access, RBAC, and customer-facing visibility
- Continuously improve observability systems to keep pace with rapid infrastructure buildouts
Telemetry & Data Pipelines
- Design and operate telemetry pipelines ingesting data from GPUs, CPUs, networking (Ethernet & InfiniBand), containers, APIs, and BMC/Redfish
- Build systems to correlate signals across infrastructure layers to enable faster debugging and root cause analysis
- Implement streaming and real-time data pipelines using tools such as Kafka, OTEL, Promtail, or similar
Alerting, Reliability & Insights
- Design and implement noise-resistant alerting systems to improve signal quality and reduce operational load
- Create dashboards and alerting for InfraOps, Engineering, and Customer Success teams
- Build automated insights and enable proactive detection, forecasting, and system health visibility at scale
Systems & Infrastructure Engineering
- Contribute to broader infrastructure engineering projects beyond observability
- Partner with infrastructure and platform teams to embed observability into core systems and workflows
- Support large-scale, distributed systems across compute, networking, and storage environments
Cross-Functional Collaboration
- Work closely with customer-facing teams to deliver external observability experiences
- Collaborate with engineering, operations, and support teams to improve system transparency and reliability
✦ Let us apply for you
We find roles like this and apply on your behalf. Cover letter written for each one. Plans from $14.99/mo. Cancel anytime.
Join waitlist