← Back to jobs
Alphasenseindia
Alphasenseindia

Cloud Reliability & Recovery Engineer

engineeringfull-timeRemote - India
SALARY
Not listed
WORK TYPE
remote
JOB TYPE
full-time
INDUSTRY
ai
Apply for this position
✦ AutoApply Let us apply to roles like this on your behalf.
Learn more

About the role

Role Overview

We are seeking an experienced Cloud Engineer to design, implement, and continuously improve our Business Continuity Planning (BCP) and Disaster Recovery (DR) capabilities across AWS cloud environments.

This is a hands-on technical role requiring deep AWS expertise, strong scripting skills, and a passion for building highly available, fault-tolerant, and resilient cloud architecture by leveraging container orchestration with Kubernetes and infrastructure as code using Terraform. Good understanding of CI/CD pipelines to enable rapid, reliable deployments and minimize downtime. Adept at implementing DR strategies including multi-region failover, backup and restore automation, and recovery testing aligned with industry BCP/DR standards. You will collaborate closely with security, infrastructure, and application teams to ensure our systems can withstand and rapidly recover from any disruption.

Reports To: Director of Event Response
Level: Senior Individual Contributor

Key Responsibilities

Cloud Resilience Architecture

  • Design and implement multi-region, multi-AZ AWS architectures that meet RTO/RPO targets
  • Engineer active-active and active-passive failover patterns using Route 53, Global Accelerator, and CloudFront
  • Build automated DR runbooks and playbooks using AWS Systems Manager Automation and Step Functions
  • Implement chaos engineering practices using AWS Fault Injection Simulator (FIS) to validate resiliency
  • Architect cross-region replication strategies for S3, DynamoDB Global Tables, RDS, and Aurora Global
  • Review containerized workloads using Kubernetes, ensuring resilience through self-healing, auto-scaling, and multi-cluster or multi-region deployments.

Backup & Recovery Engineering

  • Administer AWS Backup across all services (EC2, EBS, RDS, EFS, FSx, DynamoDB, Aurora) with policy-based automation
  • Design immutable backup vaults and cross-account/cross-region backup replication pipelines
  • Develop and automate data recovery testing procedures, ensuring integrity and meeting defined SLAs
  • Implement point-in-time recovery (PITR) for databases and storage; validate via regular restore drills
  • Maintain Business Continuity Plans (BCP) and Disaster Recovery (DR) strategies, including tracking RTO (Recovery Time Objective) and RPO (Recovery Point Objective).

Infrastructure as Code & Automation

  • Design and manage infrastructure as code using Terraform to ensure consistent, repeatable deployments
  • Implement CI/CD pipelines for automated testing and deployment of infrastructure changes
  • Develop automation scripts for routine operational tasks and incident response
  • Monitor system health and performance, proactively identifying and resolving issues
✦ Let us apply for you
We find roles like this and apply on your behalf. Cover letter written for each one. Plans from $14.99/mo. Cancel anytime.
Join waitlist
Apply now
Cloud Reliability & Recovery Engineer at Alphasenseindia — Remote