Access and Feeds

Chaos Engineering: A Proactive Approach to Probe the Limits of Software Systems

By Dick Weisinger

While most companies that design software focus on getting things right, few focus on how users can break what they’ve created.  Software businesses increasingly accept that even the best software has holes and they want to know how to uncover and fix them before either their system breaks down in production or their data and algorithms have been hacked.

For example, it is becoming routine for big software giants to offer bug bounties that challenge hackers to try to break in or find defects in their software.

NetFlix is one company that is pioneering the idea of chaos engineering.  The idea of chaos engineering is to find talented programmers and engineers that are devoted to finding problems with software rather than spending time trying to fix things.  It’s a proactive approach for probing and testing the limits of their software to see what they can do to make the system break.

Bruce Wong, former NetFlix engineering manager, wrote that “we are constantly testing our ability to survive ‘once in a blue moon’ failures. In a sign of our commitment to this very philosophy, we want to double down on chaos aka failure-injection. We strive to mirror the failure modes that are possible in our production environment and simulate these under controlled circumstances. Our engineers are expected to write services that can withstand failures and gracefully degrade whenever necessary. By continuing to run these simulations, we are able to evaluate and improve such vulnerabilities in our ecosystem.”

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Leave a Reply

Your email address will not be published. Required fields are marked *

*

13 − 3 =