by Bryon D Beilman
Back on May 4th, 2007, I posted a blog called What are the 9's ? That briefly discussed the concept of measuring service availability in percentages which is a common industry practice. Most providers shoot for as many 9's as possible, (ie 99.999%) availability of the service you are measuring. As I was working on an operational report for one of our customers, I was summarizing some of the results of the year and ran across an interesting metric. We took over managing this customer in November of 2010 and they directly generate revenue based on the availability of the services they provide with their servers and network.
When I summarized the Average Availability of all the customer service, I found the following numbers.
So in 2 years, we brought them from 8 days & 6 hours per year of downtime to 4 hrs and 22 minutes per year of downtime. This is measured over 24 different services, so as I was looking at the numbers, I felt pretty good. Could it be better? Definitely, but what was really churning in my mind was that without these measurements, we would not have a way to track our successes and failures. We were motivated to improve the reliability for selfish reasons, as any instance of a service outage , big or small , results in a page that requires immediate action and fixing the root causes, results in less pages and more sleep.
Next year...... one more 9!