Alphasenseindia

Cloud Reliability & Recovery Engineer

engineeringfull-timeRemote - India

SALARY

Not listed

WORK TYPE

remote

JOB TYPE

full-time

INDUSTRY

Apply for this position

✦ AutoApply Let us apply to roles like this on your behalf.

Learn more

About the role

Role Overview

We are seeking an experienced Cloud Engineer to design, implement, and continuously improve our Business Continuity Planning (BCP) and Disaster Recovery (DR) capabilities across AWS cloud environments.

This is a hands-on technical role requiring deep AWS expertise, strong scripting skills, and a passion for building highly available, fault-tolerant, and resilient cloud architecture by leveraging container orchestration with Kubernetes and infrastructure as code using Terraform. Good understanding of CI/CD pipelines to enable rapid, reliable deployments and minimize downtime. Adept at implementing DR strategies including multi-region failover, backup and restore automation, and recovery testing aligned with industry BCP/DR standards. You will collaborate closely with security, infrastructure, and application teams to ensure our systems can withstand and rapidly recover from any disruption.

Reports To: Director of Event Response
Level: Senior Individual Contributor

Key Responsibilities

Cloud Resilience Architecture

Design and implement multi-region, multi-AZ AWS architectures that meet RTO/RPO targets
Engineer active-active and active-passive failover patterns using Route 53, Global Accelerator, and CloudFront
Build automated DR runbooks and playbooks using AWS Systems Manager Automation and Step Functions
Implement chaos engineering practices using AWS Fault Injection Simulator (FIS) to validate resiliency
Architect cross-region replication strategies for S3, DynamoDB Global Tables, RDS, and Aurora Global
Review containerized workloads using Kubernetes, ensuring resilience through self-healing, auto-scaling, and multi-cluster or multi-region deployments.

Backup & Recovery Engineering

Administer AWS Backup across all services (EC2, EBS, RDS, EFS, FSx, DynamoDB, Aurora) with policy-based automation
Design immutable backup vaults and cross-account/cross-region backup replication pipelines
Develop and automate data recovery testing procedures, ensuring integrity and meeting defined SLAs
Implement point-in-time recovery (PITR) for databases and storage; validate via regular restore drills
Maintain Business Continuity Plans (BCP) and Disaster Recovery (DR) strategies, including tracking RTO (Recovery Time Objective) and RPO (Recovery Point Objective).

Infrastructure as Code & Automation

Design and manage infrastructure as code using Terraform to ensure consistent, repeatable deployments
Implement CI/CD pipelines for automated testing and deployment of infrastructure changes
Develop automation scripts for routine operational tasks and incident response
Monitor system health and performance, proactively identifying and resolving issues

✦ Let us apply for you

We find roles like this and apply on your behalf. Cover letter written for each one. Plans from $15/mo. Cancel anytime.

Get AutoApply

Apply now