Building Reliability
as an Asset.
I design and scale distributed systems that self-heal, reducing incident response time and engineering cognitive load.
(within 2 quarters)
Automated
Optimization
Per Month
Ownership
Personal Narrative
"The bridge between code and reliability is where I live. I build platforms that reduce cognitive load, so engineers can focus on creation, not containment."
The Reliability Ecosystem
Detect π‘ Resolve π‘ Learn π‘ Prevent. A complete reliability lifecycle built at scale.
Cyclops
AI-assisted frontend reliability system detecting broken user journeys before impact. Reduced detection from 60m to near real-time.
Compass IQ
Owned end-to-end design of an AI incident intelligence engine slashing MTTR for 1,200+ service flows. • Built in <6 weeks.
RootLens
AI-powered RCA quality engine that standardizes post-incident analysis using structured evaluation and automated, non-blaming feedback.
OWCC & Hyperion
Systemic transformation of command center nodes + predictive auto-scaling saving 30+ engineering hours per month.
Svasthyaminds
Foundational platform for B2B2C wellness infrastructure. Cognitive reset via binaural audio systems and AI-driven sessions.
Engineering Philosophy
"Management is the art of reducing the cognitive load required for brilliance."
01 // Automation First
If it's repeatable, it's code. Manual toil is the signal of a system that hasn't yet been mastered.
02 // Platforms > Patchwork
Build durable, reusable infrastructure foundations that prevent classes of failure before they occur.
03 // Cognitive Load focus
Simplify the developer experience. Reliability is a byproduct of systems that are easy to understand.
04 // Predictive scaling
Solve for 10x today, so 100x feels like business as usual tomorrow. Anticipate fail-modes proactively.
Why Hire Me?
Building reliability not just in systems, but in the teams that own them.
Elite Positioning
Directly managing 8+ engineers across mission-critical banking platforms. Successfully fast-tracked from Senior Engineer to EM in under 3 years.
Proven Transformation
Architect of Compass IQ and the OWCC overhaul, reducing system noise by 99% and MTTR by 3x within 2 fiscal quarters.
Hiring Excellence
Systemically improved team visibility for ZCC & HDFC Bank. Guided 2/3 of direct reports to 'Outstanding' 2025 performance ratings.
Professional Evolution
Engineering Manager - SRE
Managing and mentoring a high-performing team of 8+ SREs, improving operational efficiency while systematically reducing on-call burnout and technical debt.
Lead SRE
Engineered an AI-assisted observability engine slashing MTTR, while transforming operational workflow by reducing footprint from 350+ noisy components to a zero-touch center.
Senior DevOps Engineer
Automated end-to-end petabyte-scale data pipelines and stabilized deployment architectures ensuring zero-downtime rollouts across cloud platforms.
Member Tech Staff
Built automated deployment systems for global enterprise engagement tools.
Member Tech Staff
Engineered early infrastructure foundations for high-scale B2B SaaS.
Case Study: Compass IQ
Trace Entropy.
Manual correlation across 1,200+ service flows was causing on-call burnout. We needed machine-level deduction speeds.
Constraints
- Data Anonymization
- Multi-hop Latency
- Cost Governance
Outcome
- -13m MTTR Avg
- 100% RCA Accuracy
- Zero Noise Baseline
Strategic Decisions: LLM vs Static Rules
Rules degrade over time with 1200+ evolving flows. LLMs adapt to trace entropy without constant tuning, shifting the maintenance from rule-writing to prompt-tuning.
Mental Model: The Cognitive Load Problem
The goal wasn't just fixing bugsβit was reducing "Outage Anxiety." By automating the RCA, we cleared the cognitive bottleneck, allowing the command center to focus on recovery execution instead of forensic deduction.
Failure Case Resilience
Hallucinations during unique anomalies are handled via Confidence Scoringβif LLM confidence is <85%, the system gracefully falls back to traditional alerting.
π§ What I'm Thinking About
AI in Incident Mgmt
Moving beyond simple RCA into predictive, self-healing system meshes before outages impact users.
The Future of SRE
Why 'Platform Engineering' is subsuming traditional SRE roles to build paved roads for developers.
Cognitive Load
Tool sprawl is the silent killer. How unifying interfaces creates compounding development velocity.