![]() |
Netflix, a provider of online streaming media, made news over the holidays when customers experienced a service outage on Christmas Eve. Imagine taking the wrapping off of your new mobile device and deciding to try it out to stream a movie. For those located in North America, you probably found that the Netflix movie streaming service was down. This outage was caused by issues within Amazon Web Services that Netflix employs to support movie streaming. Initially, the Amazon support team pursued API errors before learning that the root cause of the outage was actually a configuration issue caused by human error. This misstep ultimately delayed the restoration of service to Netflix customers. Over the course of that day, the configuration error first manifested itself as performance degradation, and then cascaded to a full service outage for many customers. One way of avoiding a situation like this one could have been to take a more system-wide approach to service assurance. Service Outages Although outages in your IT environment might not receive attention in the press, they can still have significant impact on your customers, and in turn, on your business. Most IT organizations are vulnerable to this kind of service disruption. Had the configuration error been detected and remediated immediately in the situation described above, maybe a few Netflix customers would have detected some degradation in performance, but it is far less likely that a large number of customers would have experienced an outage. Through 2015, 80% of the outages impacting mission-critical services are expected to be caused by people and process issues, according to Gartner (Top Seven Considerations for Configuration Management for Virtual and Cloud Infrastructure, October 2010). More than half of these outages will be attributed to change, configuration, and other related issues. Additionally, many of these outages will be further exacerbated by a dependence on Only 22% of organizations surveyed by Gartner, however, have deployed the full complement of fault, performance, and configuration management capabilities necessary to provide a solid foundation for robust monitoring. Over half of these organizations (51.3%) have their network fault management bases covered, but performance and configuration capabilities are expected to lag through 2017 according to a recent Gartner research report (I& O Teams Must Proactively Develop Three Core Network Management Disciplines, December 2012). Service Assurance Given that a large proportion of service disruptions originate from configuration- and change-related anomalies, savvy organizations will proactively extend their management capabilities to include unified configuration, fault, and performance management capabilities. For these organizations, including the ones with virtual and cloud infrastructure, key considerations for effective and efficient service assurance management include:
Getting a complete service assurance picture means having an integrated view of availability, performance, and configuration data. In the stressful environment caused by service outages where problems seem to come from all directions, the ability to automatically calculate business impact is crucial to IT operations making the right decisions for the business. Accurate and real-time configuration insights facilitate in-context remediation of availability and performance issues, and can prevent potentially serious outages before they even occur. Meet Service-Levels Today CIOs are being asked to decrease the portion of their annual budgets for resources devoted to the basics of running a data center, and invest more in business innovation. Yet they also must consistently maintain high service levels, making the elimination of blind spots in providing service assurance more critical than ever. Solutions like EMC Smarts provide service assurance for critical applications through automated root-cause and business-impact analysis that encompasses fault, performance, and configuration for compute, network, and storage. These critical applications and services vary by organization and may include core processes like accounts receivable and billing—but just as likely, in the case of Netflix, a customer-facing service (on-demand streaming video). Though the product name came into being during different times and Smarts has evolved to span physical, virtual, and cloud deployments. The Smarts name rightfully invokes the idea that higher intelligence is needed to work across these different environments. An intelligence for service assurance that many cloud and service providers, as well as enterprises, probably want—and need. |
Update your feed preferences |
