| By David M. Lynch | Article Rating: |
|
| February 9, 2010 09:45 PM EST | Reads: |
580 |
Virtualization Magazine on Ulitzer
As organizations ramp up their adoption of server virtualization, more and more are experiencing bottlenecks that slow them down, as well as the need for additional and unplanned funding, usually created by virtual sprawl.
But while sprawl is the direct cause of unplanned expenses, it is not the real problem. Rather it's a symptom of something else. While attacking it directly may provide some short-term relief, like salve on an uncomfortable rash, it will come right back if the root causes are not eliminated. This article identifies the true source of virtual sprawl and demonstrates how it can be eradicated by treating the cause rather than the symptom.
We will start by looking at the types of sprawl and the symptoms one can expect to see in a sprawled environment. From there we will look at the underlying causes of sprawl and finish up with ways of dealing with these.
Types of Virtual Sprawl
Virtual sprawl is a generic term, mostly used to describe the concept of having more virtual machines in your environment than are needed. But sprawl actually goes further than this. It's not about the number of VMs that have been created; it's more about their state.
1. Underutilized or Unused VMs
VMs have variable lifecycles - some last for years, while others only for minutes. Take a standard lab or development environment, for example. VMs are typically created as needed, and, in a dynamic environment, they are not always decommissioned when they should be and continue to "live on" in the environment, consuming valuable resources but not actually serving any real purpose.
2. Offline VMs
Virtualization allows VMs to easily change their states - moving from offline to online to a suspended state for indeterminate amounts of time. Offline VMs may be in that state for a reason, or they may be offline simply because someone turned them off but never got around to actually decommissioning them.
Offline VMs incur much the same costs as running ones; you still have to pay license fees as well as storage costs, and these costs can add up. One of our customers found that they had approximately $50,000 of disk and license costs tied up in 42 VMs that had been offline for more than 90 days.
Unused VMs, whether running or offline, reserve and use resources that could be reused. Removing them lowers the overall datacenter costs and frees up funding and resources that can be redeployed, allowing better resource utilization.
3. Orphan or Unauthorized VMs
This class of sprawl relates to VMs that haven't been commissioned properly. We use the term "orphan" to describe a VM that doesn't have a parent. Best practice both for standardization and effective use suggests that VMs be provisioned from a template rather than created ad-hoc. Orphans are VMs that were not created in this way. Some may be necessary (especially in a lab environment), but if the number is large, you can expect to run into configuration and troubleshooting issues.
In the same manner, unauthorized VMs are also a type of sprawl, either because the VM is not supported or it's found running inappropriately. This could be a development VM on production hardware, or a low-priority application VM consuming resources on high-availability hardware.
4. Out of Inventory VMs (Invisible Inventory)
The decommissioning of virtual machines in a VMware environment is a two-step process: the VM must be removed from the VirtualCenter or vCenter inventory before it can be deleted from the storage array.
Should anything happen to prevent the second step from occurring (the administrator is pulled away to deal with an urgent situation before the process is complete, for example), then the VM image is no longer seen by the management software and consequently not counted or managed (it becomes invisible). But it is still present on disk. Which means that it is not only consuming valuable resources, but should the VM image contain protected data, this may also represent a potential compliance issue as it is not being monitored or managed by any datacenter system.
We have found this class of VM present even in highly controlled environments. One of our finance customers recognized this as an issue early on in its virtualization initiative and put in place "cast iron" processes to ensure that the "remove from inventory" and "delete from disk" commands were never separated. Even with these controls in place, we found that 3% of its VMs were in this state.
5. Resource Sprawl (Over-Provisioning)
In some cases, the quantity of VMs may be optimal, but the resources that they are using are not. For example, one of our customers had established a standard storage reservation of 32 GB for every VM.
Selecting a large disk reservation ensured that they would never have an outage caused by lack of storage, saved time when commissioning new virtual machines, and provided a standard for accounting purposes. But not every application or operating system needs this amount of space (in fact the majority of the VMs in this environment only used 4 to 6 GB on a regular basis), wasting valuable resources.
This may have been fine when the environment was small, but as the environment grew, the customer found itself consuming more and more storage when only a fraction of it was being used on a regular basis.
6. Excess Snapshots
Snapshots are a useful tool but can create significant problems if not controlled. Multiple snapshots do waste disk resources, but the bigger issue here is their impact on flexibility. Snapshots are meant to be used for testing apps installations or patches, providing a means of reverting back to the original if problems occur. They are not meant as a means of backup or other long-term advantage and can create issues with vMotion. In most cases, snapshots have to be merged or removed before a VM can be moved using vMotion. They can create real problems if you have too many around.
How Do You Know If You Have Sprawl?
There are a few primary signals that indicate a sprawled environment. The first and most obvious is that you will consume resources, hosts, memory and storage faster than you should, usually resulting in unplanned spending, in order to continue your growth.
Other indicators include extended troubleshooting times, difficulty in inventorying your VM environment, inconsistent VM configurations, and bottlenecks in provisioning. Bottom line: sprawl will create inefficient operations, making it difficult to maintain or accelerate your adoption of virtualization.
What Causes It?
According to the customers we surveyed, the real cause of virtual sprawl lies in a fundamental weakness in the virtualization management infrastructure. You can see these weaknesses by comparing the state of the virtualization management systems with those used to manage the physical datacenter. When you do this, you uncover three main deficiencies:
Lack of Insight / Reporting
The visibility and insight into multi-VirtualCenter/vCenter VMware environments leaves a lot to be desired. In fact, most administrative teams use spreadsheets to augment the information these systems provide and spend a lot of time stitching together reports and troubleshooting information.
Lack of Automation
There is very little automation provided in the VirtualCenter/vCenter management systems, which means that the day-to-day management of virtual server environments is a predominantly manual activity, which not only takes time, but presents challenges in terms of consistency.
Lack of Administration Time
Because of this lack of oversight of tools and automation, it's easy for virtual administrators to get immersed in the day-to-day tasks of keeping everything running and, by necessity, forgo the more proactive tasks of optimizing the environment.
The Sprawl Causal Loop
These three factors work together in a causal loop to create sprawl; while all three elements create sprawl, the first two issues - lack of insight and lack of automation - directly exacerbate the third issue, the lack of administration time.
Lack of insight/reporting in the environment means that administrators have to spend more time troubleshooting, or simply understanding the environment, reducing the amount of time they have available for proactive optimization, resulting in sprawl.
Lack of automation in the environment means that administrators have to spend time manually managing the environment, further reducing the amount of time they have for proactive optimization and increasing the potential for sprawl.
Increased sprawl in turn increases the need for management (simply by increasing the overall number of VMs in a given environment), as well as increasing overall troubleshooting time (more servers with which to deal, lack of standards and more shortcuts taken), further reducing the amount of time for proactive optimization and further increasing the potential for sprawl.

It's this feedback loop that makes sprawl both inevitable and difficult to manage. While environments are small, there may be time available to still do everything. But as environments grow, something has to give; firefighting takes over, and sprawl results.
This is not a problem that can be solved by attacking the symptom (sprawl). You may gain some temporary relief, but as the underlying cause hasn't changed, the symptom will simply continue to emerge.
Resolving Sprawl
To resolve this issue, you can either continually add staff to ensure that your administrators always have time to optimize the environment (which doesn't make financial or operational sense), or attack the cause of the problem and improve the insight and automation within your virtual server environments.
1. Improving Insight and Reporting
Getting a clear picture of what is happening within the virtual environment is essential, both for effective management as well as for effective troubleshooting. Unfortunately, neither VirtualCenter nor vCenter provide adequate insight and reporting capability here, and this problem is compounded if multiple VirtualCenters are used in the environment, as they don't talk to each other.
Some level of centralized or federated inventory management system is required that not only self-discovers VMs within the environment and includes effective reporting capabilities, but also traps business-level information (project, cost center, business owner, charge codes, expiry date, etc.), creating a single repository of VM information that places information at the fingertips of the administrators tasked with managing the environment.
Customers have informed us that once a system like this is in place, administrative workload can drop from 10 percent to 20 percent depending on how much report generation they were doing previously.
2. Improving Virtualization Automation
There is an assumption that automation systems are complex and time-consuming to install and configure and, indeed, some are. Fortunately, however, there are also out-of-the-box automation systems available today that make automation available without the need for a "forklift."
There are a lot of areas where automation can provide consistency and standardization while at the same time offloading the administrative team. These include provisioning systems, real-time discovery, automated decommissioning workflow, sprawl and out-of-process detection, as well as approval workflows and automatic alerting.
The key here is to automate the "grunt work" that your administrative team is doing, while at the same time providing automated alerting that allow your administrative team to "manage by exception" rather than by "effort." Equally as important is to select automation that can drop in without extensive training and setup, because if you are already in a sprawled condition, your administrative team simply does not have the available time to spend on extensive setups. (To see how Computacenter did this, download the case study at: http://www.embotics.com/case-study-computacenter.)
3. Freeing up Administration Time
Attacking the lack of information and lack of automation will allow your administrative team to manage by exception and free up their time to proactively manage and optimize the environment.
Centralizing all information about VMs in the environment, including the organizational and business-level information, allows administrators to save time troubleshooting, while providing better reporting to facilitate better levels of service and allow better decision making and capacity planning.
A centralized inventory of record combined with automatic sprawl and out-of-process identification allows administrators to quickly and easily identify sources of issues and deal with them before they impact customers.
Bottom Line
While the costs relating to virtual sprawl can be significant and can eat up your entire software license budget, consume more and more administrator time and eventually require the purchase of more physical servers and disks than you need, virtual sprawl itself is really only a symptom of a much bigger issue.
The primary cause of sprawl is the combination of a lack of insight and a lack of automation eating away at your administrators' time, forcing them into firefighting mode and away from operational optimization. And the situation gets worse as the environment grows.
Attacking the symptom may bring some immediate relief, but the underlying causes will ensure that sprawl returns, along with its associated costs and risks. The real solution is to attack the cause as well as the symptom, and get the immediate relief combined with the assurance that sprawl will not return.
At the same time you free up your administrative team for more value-add activities, reduce your ongoing support costs and improve the overall consistency of your environment.
References
- Controlling VM Sprawl: Best Practices for Gaining and Maintaining Control of Virtualized Infrastructure, whitepaper and webinar by Anil Desai.
- Understanding Virtual Sprawl whitepaper by David M. Lynch
Published February 9, 2010 Reads 580
Copyright © 2010 Ulitzer, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By David M. Lynch
David M. Lynch is vice president of marketing for Embotics. He is a well-rounded 30-year veteran of the high-tech marketplace with extensive P&L and international expertise in service, hardware and software products. David holds degrees in nautical science, computer technology and an MBA in strategic marketing.
- How to Safely Publish Internal Services to the Outside World
- Good Fences Between Apps and OS Make Good Neighbors in the Cloud
- AppZero Named One of “100 Coolest Cloud Computing Products”
- AppZero CEO Greg O'Connor to Present at Cloud Expo East
- Cloud Computing and Application Mobility
- AppZero Named “Bronze Sponsor” of Cloud Expo East
- F5 Improves Customers’ IT Infrastructure Agility
- VM Sprawl is Bad but Network Sprawl is Badder
- Virtual Sprawl Is Not the Real Problem
- Akorri BalancePoint Selected for International Computerware, Inc.'s vCube Offering
- Vertica Broadens Reach of Its Analytic Database with Version 4.0
- Bitrix Strengthens SaaS Offering with Amazon EC2
- Virtualization Expo New York Call for Papers to Expire January 15, 2010
- Comparing Virtualization Technologies
- How to Safely Publish Internal Services to the Outside World
- Good Fences Between Apps and OS Make Good Neighbors in the Cloud
- Virtualization Eyes the Big Prize of Critical Production Applications
- AppZero Named One of “100 Coolest Cloud Computing Products”
- AppZero CEO Greg O'Connor to Present at Cloud Expo East
- Cloud Computing and Application Mobility
- AppZero Named “Bronze Sponsor” of Cloud Expo East
- F5 Improves Customers’ IT Infrastructure Agility
- VM Sprawl is Bad but Network Sprawl is Badder
- Virtual Security Without Gaps
- The Top 150 Players in Cloud Computing
- Virtualization Expo New York Call for Papers to Expire January 15, 2010
- AppZero Founder Launches Virtual Application Appliances Topic on Ulitzer
- The Next Big "Cloud" Thing: VMWare’s Virtual Platform Stack
- Cloud Computing + POC = ‘Obvious’ ISV Revenue Growth
- AppZero Named “Silver Sponsor” of 4th Cloud Computing Expo
- Who Invented Virtualization?
- AppZero Discussed Transition to Cloud at Cloud & Virtualization Conference
- Provisions To and From * Cloud with Virtual Application Appliances
- VMware’s Genius: Doing Something Old
- Appzero to Present at SYS-CON's Virtualization Conference
- Comparing Virtualization Technologies






























Ulitzer content is offered under Creative Commons "Attribution Non-Commercial No Derivatives" License.
For any reuse or distribution, you must make clear to others the license terms of this work.
The best way to do this is with a link to this web page.
Any of the above conditions can be waived if you get written permission from Ulitzer, Inc., the copyright holder.
Nothing in this license impairs or restricts the author's moral rights.