The Infoq Podcast

Looking for Root Causes is a False Path: A Conversation with David Blank-Edelman

Informações:

Synopsis

In this podcast Michael Stiefel spoke with David Blank-Edelman about the relationship between software architecture and site reliability engineering. Site reliability engineering can give architecture vital feedback about how the system actually behaves in production. Architects and designers can then learn from their failures to improve their ability to build systems that can evolve, gracefully degrade, and, at the right time, be taken out of production. Reliability, just like latency, throughput, and durability, is an emergent property of a system. Hence the search for a root cause for a failure is counter-productive, and is not likely to produce a reliable system. One often has to look for what went right, as opposed to what went wrong. Software reliability engineers and architects need to embrace a mutual curiosity of how a system really works and fails in the real world. Read a transcript of this interview: http://bit.ly/4n63xuC Subscribe to the Software Architects’ Newsletter for your monthly guide