Varun H V

Professional Summary

Engineering Manager (SRE & Platform) with 8+ years of experience building large-scale, self-healing distributed systems. Proven track record of reducing MTTR by 3x, optimizing cloud costs by 91%, and leading reliability initiatives across 1200+ services. Specialized in Kubernetes, Observability, and AI-driven Incident Management for mission-critical financial platforms serving millions of users across regions.

Technologies: Kubernetes, AWS, Prometheus, Grafana, Terraform, Python, Go, Jaeger, Docker, Ansible, Jenkins.

Domains: Distributed Systems, High Availability, Multi-Region Systems, Performance Engineering.

Professional Experience

Zeta Jan 2026 – Present

Engineering Manager - SRE Bangalore, India

Leading SRE strategy for 1200+ services across 15+ zones, ensuring multi-region availability for mission-critical banking systems serving millions of users.
Managing and mentoring a high-performing team of 8+ SREs, improving operational efficiency while systematically reducing on-call burnout.
Driving AI-powered incident intelligence and observability roadmaps to slash MTTR and enhance system resilience for ZCC and HDFC Bank stakeholders.
Improved incident response efficiency by 3x through AI-driven automation and strategic observability enhancements.
Owning the platform engineering roadmap, production readiness, and organizational reliability standards across multiple business-critical SaaS modules.

Zeta Mar 2024 – Feb 2026

Lead Site Reliability Engineer Bangalore, India

Owned end-to-end design and execution of Compass IQ, an AI-driven incident intelligence platform reducing MTTR from 15 minutes to under 2 minutes across 1,200+ service flows.
Built RootLens, an AI-powered RCA governance system that standardizes post-incident analysis using structured evaluation and automated feedback, improving RCA quality across teams.
Designed Cyclops, an AI-assisted frontend reliability system that detects broken user journeys (login, search, checkout) before impact, reducing detection time from 30–60 mins to near real-time.
Transformed the Olympus World Command Center (OWCC), reducing operational noise from 350+ unhealthy components to near-zero, stabilizing the entire SaaS vertical footprint.
Drove Cloud Cost Optimization initiatives, achieving a 91% reduction in S3 storage costs and implementing Kubecost for cross-team financial governance.
Led organizational escalations and high-stakes production readiness reviews for critical ZCC and tier-1 banking stakeholders.

Zeta Dec 2021 – Mar 2024

Senior Development Operations Engineer Bangalore, India

Engineered and scaled Kubernetes clusters (PCI and Non-PCI) supporting 1200+ service flows across multi-zone deployments.
Automated high-toil on-call workflows using Python and Go, saving 30+ engineering hours per month and improving developer velocity.
Built and scaled petabyte-scale CI/CD pipelines enabling zero-downtime deployments across complex distributed systems.

DataWeave Sep 2019 – Dec 2021

Senior DevOps Engineer (Promoted from DevOps Engineer) Bangalore, India

Managed high-scale data extraction infrastructure across Amazon Web Services (AWS).
Scaled configuration management systems (Ansible) to handle thousands of concurrent nodes.
Implemented comprehensive monitoring and alerting using Prometheus and Grafana.

[24]7.ai Jun 2018 – Sep 2019

Member Technical Staff Bangalore, India

Collaborated on configuration management and continuous integration for customer engagement platforms.
Optimized build artifacts and deployment scripts for high-availability distributed systems.
Optimized build and deployment scripts for high-availability distributed systems.

Aurigo Software Technologies Jun 2017 – Jun 2018

Member Technical Staff Bangalore, India

Designed and maintained CI pipelines for cloud-based construction management software.
Supported infrastructure migrations and ensured build system stability for global engineering teams.

Professional Summary

Core Competencies

Professional Experience

Founding Experience

Certifications

Education