External Publications

  • Hunting for Risky Dependencies in the World of Microservices (SREcon 2022): How many of your internal-only backends are actually exposed to the outside world? Probably more than you think. With the rise of microservices and complex systems, service owners are less aware of the critical user journeys depending on their systems. In this talk, you will learn about a simple yet powerful application of OpenTelemetry to find and fix major serving outages before they occur. I will also share several high risk dependencies within Google Maps that we caught by using this tool.
  • Hunting for Risky Dependencies (Usenix publication, 2024)
  • Mapping a Better Future with STPA (SREcon 2025): Want to prevent outages before they happen? Traditional SRE methods focus on component failures, but a whole class of outages stem from unexpected system interactions. We found a solution. In our team, we use Systems Theoretic Process Analysis (STPA) to identify and fix system-level vulnerabilities before they cause outages. By applying STPA during the design phase, we've prevented major incidents and saved countless engineering hours. This talk will show you how STPA can transform your approach to reliability. We'll share a real-world example where STPA caught critical design flaws that traditional methods missed, saving us months of costly rework. Don't wait for outages to happen. Learn how STPA can help you build more resilient systems and become a 1000x engineer.