Engineering Manager (SRE & Platform) with 8+ years of experience building large-scale, self-healing distributed systems. Proven track record of reducing MTTR by 3x, optimizing cloud costs by 91%, and leading reliability initiatives across 1200+ services. Specialized in Kubernetes, Observability, and AI-driven Incident Management for mission-critical financial platforms serving millions of users across regions.
Technologies: Kubernetes, AWS, Prometheus, Grafana, Terraform, Python, Go, Jaeger, Docker, Ansible, Jenkins.
Domains: Distributed Systems, High Availability, Multi-Region Systems, Performance Engineering.