External Publications
-
Hunting for Risky Dependencies in the World of Microservices
(SREcon 2022):
How many of your internal-only backends are actually exposed
to the outside world? Probably more than you think. With the
rise of microservices and complex systems, service owners are
less aware of the critical user journeys depending on their
systems. In this talk, you will learn about a simple yet
powerful application of OpenTelemetry to find and fix major
serving outages before they occur. I will also share several
high risk dependencies within Google Maps that we caught by
using this tool.
-
Hunting for Risky Dependencies (Usenix publication, 2024)
-
Mapping a Better Future with STPA (SREcon 2025):
Want to prevent outages before they happen? Traditional SRE
methods focus on component failures, but a whole class of
outages stem from unexpected system interactions. We found a
solution. In our team, we use Systems Theoretic Process
Analysis (STPA) to identify and fix system-level
vulnerabilities before they cause outages. By applying STPA
during the design phase, we've prevented major incidents and
saved countless engineering hours. This talk will show you how
STPA can transform your approach to reliability. We'll share a
real-world example where STPA caught critical design flaws
that traditional methods missed, saving us months of costly
rework. Don't wait for outages to happen. Learn how STPA can
help you build more resilient systems and become a 1000x
engineer.