Impact Systems Philosophy Experience Case Study ATS Resume
Signal Ready: Engineering Manager @ Zeta

Building Reliability
as an Asset.

Reliability Systems Architect | SRE Leadership | AI Platforms
⚑ Trajectory: Senior Engineer β†’ EM in <3 years

I design and scale distributed systems that self-heal, reducing incident response time and engineering cognitive load.

0x
Reduced MTTR
(within 2 quarters)
0+
Service Flows
Automated
0%
Cloud Cost
Optimization
0h+
Toil Saved
Per Month
0%
Lifecycle
Ownership
Varun H V

Personal Narrative

"The bridge between code and reliability is where I live. I build platforms that reduce cognitive load, so engineers can focus on creation, not containment."

The Reliability Ecosystem

Detect πŸ‘’ Resolve πŸ‘’ Learn πŸ‘’ Prevent. A complete reliability lifecycle built at scale.

πŸ‘οΈ
[PREVENT]
Detection: <5 mins

Cyclops

AI-assisted frontend reliability system detecting broken user journeys before impact. Reduced detection from 60m to near real-time.

🧠
[RESOLVE]
MTTR: 15m β†’ <2m

Compass IQ

Owned end-to-end design of an AI incident intelligence engine slashing MTTR for 1,200+ service flows. • Built in <6 weeks.

πŸ”
[LEARN]
RCA Governance

RootLens

AI-powered RCA quality engine that standardizes post-incident analysis using structured evaluation and automated, non-blaming feedback.

πŸš€
[SCALE]
Noise Reduced 99%

OWCC & Hyperion

Systemic transformation of command center nodes + predictive auto-scaling saving 30+ engineering hours per month.

🧘
Infrastructure: Wellness Lab

Svasthyaminds

Foundational platform for B2B2C wellness infrastructure. Cognitive reset via binaural audio systems and AI-driven sessions.

Engineering Philosophy

"Management is the art of reducing the cognitive load required for brilliance."

01 // Automation First

If it's repeatable, it's code. Manual toil is the signal of a system that hasn't yet been mastered.

02 // Platforms > Patchwork

Build durable, reusable infrastructure foundations that prevent classes of failure before they occur.

03 // Cognitive Load focus

Simplify the developer experience. Reliability is a byproduct of systems that are easy to understand.

04 // Predictive scaling

Solve for 10x today, so 100x feels like business as usual tomorrow. Anticipate fail-modes proactively.

Why Hire Me?

Building reliability not just in systems, but in the teams that own them.

Impact Baseline
$1M+ Saved Annually

Elite Positioning

Directly managing 8+ engineers across mission-critical banking platforms. Successfully fast-tracked from Senior Engineer to EM in under 3 years.

Proven Transformation

Architect of Compass IQ and the OWCC overhaul, reducing system noise by 99% and MTTR by 3x within 2 fiscal quarters.

Hiring Excellence

Systemically improved team visibility for ZCC & HDFC Bank. Guided 2/3 of direct reports to 'Outstanding' 2025 performance ratings.

Professional Evolution

Z
2026 – PRESENT

Engineering Manager - SRE

Zeta (HDFC Bank)
Leading SRE strategy for 1200+ services across 15+ zones, ensuring 99.99% availability for million-user banking systems.

Managing and mentoring a high-performing team of 8+ SREs, improving operational efficiency while systematically reducing on-call burnout and technical debt.

Z
2024 – 2026

Lead SRE

Zeta
Architected Compass IQ and executed the OWCC overhaul reducing noise by 99%.

Engineered an AI-assisted observability engine slashing MTTR, while transforming operational workflow by reducing footprint from 350+ noisy components to a zero-touch center.

Z
2021 – 2024

Senior DevOps Engineer

Zeta / DataWeave
Scaled multi-cloud Kubernetes clusters to support 1200+ distinct flows.

Automated end-to-end petabyte-scale data pipelines and stabilized deployment architectures ensuring zero-downtime rollouts across cloud platforms.

[24]

Member Tech Staff

2018 – 2019
[24]7.ai

Built automated deployment systems for global enterprise engagement tools.

A

Member Tech Staff

2017 – 2018
Aurigo

Engineered early infrastructure foundations for high-scale B2B SaaS.

Case Study: Compass IQ

The Core Challenge

Trace Entropy.

Manual correlation across 1,200+ service flows was causing on-call burnout. We needed machine-level deduction speeds.

Constraints

  • Data Anonymization
  • Multi-hop Latency
  • Cost Governance

Outcome

  • -13m MTTR Avg
  • 100% RCA Accuracy
  • Zero Noise Baseline
Architecture Flow: Jaeger πŸ‘’ Vector ID πŸ‘’ LLM Inference

Strategic Decisions: LLM vs Static Rules

Rules degrade over time with 1200+ evolving flows. LLMs adapt to trace entropy without constant tuning, shifting the maintenance from rule-writing to prompt-tuning.

Mental Model: The Cognitive Load Problem

The goal wasn't just fixing bugsβ€”it was reducing "Outage Anxiety." By automating the RCA, we cleared the cognitive bottleneck, allowing the command center to focus on recovery execution instead of forensic deduction.

Failure Case Resilience

Hallucinations during unique anomalies are handled via Confidence Scoringβ€”if LLM confidence is <85%, the system gracefully falls back to traditional alerting.

"
"Reduced MTTR and transformed how we debug incidents across the organization."
β€” ZCC Engineering Team

🧠 What I'm Thinking About

AI in Incident Mgmt

Moving beyond simple RCA into predictive, self-healing system meshes before outages impact users.

The Future of SRE

Why 'Platform Engineering' is subsuming traditional SRE roles to build paved roads for developers.

Cognitive Load

Tool sprawl is the silent killer. How unifying interfaces creates compounding development velocity.