Cribl

Sr Software Engineer, Storage

engineeringfull-timeRemote - United States

SALARY

Not listed

WORK TYPE

remote

JOB TYPE

full-time

INDUSTRY

Apply for this position

✦ AutoApply Let us apply to roles like this on your behalf.

Learn more

About the role

Why You'll Love This Role

Cribl is seeking a Senior Software Engineer to join our Storage team, where you'll design and build the infrastructure that allows Cribl's storage layer to scale autonomously. Our platform ingests, indexes, and serves petabytes of telemetry data on AWS — and you'll own the systems that make that possible: autoscaling clusters, automated provisioning, self-healing infrastructure, and the operational tooling that keeps it all running without human intervention.

This is a platform engineering role at its core. You won't just operate infrastructure — you'll build the systems that operate themselves. Think: cluster lifecycle management, automated capacity planning, infrastructure-as-code pipelines that provision and scale storage tiers end-to-end, and the observability layer that closes the loop. You'll bring infrastructure discipline and DevOps automation to a distributed storage system that needs to grow by orders of magnitude while staying rock-solid.

If you're the kind of engineer who builds autoscalers instead of manually resizing, writes controllers instead of runbooks, and thinks about cluster topology as a software problem — this is your role.

As An Active Member Of Our Team, You Will...

Design and build autoscaling systems for storage clusters — automated provisioning, scale-up/scale-down policies, cluster rebalancing, and node lifecycle management.
Own the infrastructure-as-code stack (Terraform) that defines and deploys storage infrastructure end-to-end on AWS.
Build self-healing automation: health checks, automated failover, capacity rebalancing, and remediation controllers that resolve issues before they page anyone.
Develop the CI/CD pipelines and deployment tooling for storage services — safe rollouts, canary deployments, automated rollback.
Design and implement observability for the entire storage platform — metrics, dashboards, SLOs, alerting, and capacity forecasting that drive automated scaling decisions.
Own cluster management tooling: provisioning new tenants, managing cluster topology, coordinating upgrades and migrations with zero downtime.
Drive performance and cost optimization across the storage data path: ingest pipelines, compaction, partitioning, and query execution.
Partner with product engineering to define scalability limits, load test new features, and harden the system for production readiness.
Contribute to incident response and lead blameless post-mortems, turning operational surprises into systemic automation.
This position will require stand-by, on-call, or off-hours duties.

If You've Got It - We Want It

Significant experience building platform/infrastructure systems that manage, scale, and operate distributed services autonomously — not just using infrastructure, but building the layer that automates it.
Strong software engineering skills in TypeScript/Node.js, Go, or similar languages — you write controllers, operators, and automation frameworks.

✦ Let us apply for you

We find roles like this and apply on your behalf. Cover letter written for each one. Plans from $14.99/mo. Cancel anytime.

Join waitlist

Apply now