Access and Feeds

Cloud Computing: The Tricky Business of Hot Upgrades

By Dick Weisinger

Jay Heiser, Gartner research vice president, told Ellen Messmer of Network World that “Gartner clients are almost universally disappointed by what they regard as the incompleteness in cloud-computing contracts where they still don’t see the level of specificity related to security they expect… Cloud contracts are incomplete.”
 
One particular area that Heiser worries about is data loss or corruption with cloud services.  Heister says that even some of the best cloud vendors, like Amazon, Google and Microsoft, have temporarily and, in some cases, permanent loss of user data.
 
System software updates is one area where data loss or corruption can happen.  Most businesses, even when dealing with ‘mission-critical’ systems, have scheduled blocks of time where system maintenance occurs. During those times updates and hardware re-configurations are performed.
 
Increasingly on-line services are global and expected to be up and running at all times.  Heister points out that the need for web sites to be up and running 24/7 means that any updates to software need to be run without shutting down the system.  Scheduled shutdowns aren’t acceptable or need to be minimized as much as possible.  To keep these systems up, often ‘hot’ or ‘rolling’ updates are performed where software is updated while the system remains up.  Usually when this happens, the software updates need to propagate across all servers that are supporting the system.  It’s during this period when new software is rolled out by hot updates where issues might arise — synchronizing the updates between servers and also between clients and servers is a difficult problem.  Of particular concern is with inadvertent loss of data.
 
Performing hot updates on an online sitecan be a tricky business.  Heister said that “Live upgrades of services can lead to widespread data corruption.” Cloud vendors like Google have fine-tuned their hot roll-out process and there don’t seem to be reports of serious problems, but a study by Carnegie Mellon and Virginia Tech researchers concluded that “current industry trends suggest that upgrade-related downtime is unacceptable for many large-scale distributed systems,such as electrical utilities, assembly-line manufacturing,customer support, e-commerce or online banking. These systems must employ online-upgrade techniques.  During an online upgrade, the system enters states that merge at run time and that may not have been validated in advance… In general, the behavior of a system with mixed versions is not guaranteed to conform to the specification of either version of the software and is hard to validate in advance…  Interactions among multiple versions of the software expose the system to race conditions that can introduce latent errors or data corruption.”
Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Leave a Reply

Your email address will not be published. Required fields are marked *

*