Access and Feeds

Cloud Computing: SLI + SLO + $ = SLA

By Dick Weisinger

Cloud Computing is hard. Customers want to be reassured that their data is safe and will be available at any time.

The terms of the Service Level Agreement (SLA) customers and vendors agree to usually promise uptimes to multiple nines. Beyond SLA’s are two other factors, SLO (Service Level Objective), and SLI (Service Level Indicator).

In most cases there are objectives (SLOs), like making sure there is uptime of, say 99.9% of time. That uptime is an indicator (SLI), a measurable. The SLA is written into the contract that states what the consequences are if the SLO isn’t achieved.

“Cloud customers want strong, understandable promises (Service Level Objectives, or SLOs) that their applications will run reliably and with adequate performance, but cloud providers don’t want to offer them, because they are technically hard to meet in the face of arbitrary customer behavior and the hidden interactions brought about by statistical multiplexing of shared resources,” wrote Jeffrey Mogul and John Wilkes from Google in an ACM paper.

Quality of Service (QoS) also factors into this equation. QoS relates to performance of a service. In terms of cloud computing, a major quality factor is latency, how fast can services be performed and delivered.

Why are SLAs hard to specify? Mogul and Wilkes write that “creating an SLA seems simple: define one or more SLOs as predicates on clearly-defined measurements (Service Level Indicators, or SLIs), then have the business experts and lawyers agree on the consequences, and you have an SLA. Sadly, in our experience, SLOs are insanely hard to specify. Customers want different things, and they typically cannot describe what they want in terms that can be measured and in ways that a provider can feasibly commit to promising.”

Often agreeing to the definition of the metric behind the SLA is hard. What is uptime, for example, and what is the granularity of measurement — minute or hour?

Mogul and Wilkes conclude that “perhaps the most important lesson we can learn from statistics, however, is humility — that the combination of unpredictable workloads, hard-to-model behavior of complex shared infrastructures, and the infeasibility of collecting all the necessary metrics means that certain kinds of SLOs are beyond our power to deliver, no matter how much we believe we need them.”

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Leave a Reply

Your email address will not be published. Required fields are marked *

*