Decouple an App From the OS Before You Move to the Cloud

Virtual Application Appliances

Subscribe to Virtual Application Appliances: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Virtual Application Appliances: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


VAA Authors: William Schmarzo, Pat Romanski, Gilad Parann-Nissany, RealWire News Distribution, Josh Mazgelis

Related Topics: Virtualization Magazine, Desktop Virtualization Journal, Virtual Application Appliances

Article

Virtual Sprawl Is Not the Real Problem

The real solution is to attack the cause as well as the symptom

Virtualization Magazine on Ulitzer

As organizations ramp up their adoption of server virtualization, more and more are experiencing bottlenecks that slow them down, as well as the need for additional and unplanned funding, usually created by virtual sprawl.

But while sprawl is the direct cause of unplanned expenses, it is not the real problem. Rather it's a symptom of something else. While attacking it directly may provide some short-term relief, like salve on an uncomfortable rash, it will come right back if the root causes are not eliminated. This article identifies the true source of virtual sprawl and demonstrates how it can be eradicated by treating the cause rather than the symptom.

We will start by looking at the types of sprawl and the symptoms one can expect to see in a sprawled environment. From there we will look at the underlying causes of sprawl and finish up with ways of dealing with these.

Types of Virtual Sprawl
Virtual sprawl is a generic term, mostly used to describe the concept of having more virtual machines in your environment than are needed. But sprawl actually goes further than this. It's not about the number of VMs that have been created; it's more about their state.

1. Underutilized or Unused VMs
VMs have variable lifecycles - some last for years, while others only for minutes. Take a standard lab or development environment, for example. VMs are typically created as needed, and, in a dynamic environment, they are not always decommissioned when they should be and continue to "live on" in the environment, consuming valuable resources but not actually serving any real purpose.

2. Offline VMs
Virtualization allows VMs to easily change their states - moving from offline to online to a suspended state for indeterminate amounts of time. Offline VMs may be in that state for a reason, or they may be offline simply because someone turned them off but never got around to actually decommissioning them.

Offline VMs incur much the same costs as running ones; you still have to pay license fees as well as storage costs, and these costs can add up. One of our customers found that they had approximately $50,000 of disk and license costs tied up in 42 VMs that had been offline for more than 90 days.

Unused VMs, whether running or offline, reserve and use resources that could be reused. Removing them lowers the overall datacenter costs and frees up funding and resources that can be redeployed, allowing better resource utilization.

3. Orphan or Unauthorized VMs
This class of sprawl relates to VMs that haven't been commissioned properly. We use the term "orphan" to describe a VM that doesn't have a parent. Best practice both for standardization and effective use suggests that VMs be provisioned from a template rather than created ad-hoc. Orphans are VMs that were not created in this way. Some may be necessary (especially in a lab environment), but if the number is large, you can expect to run into configuration and troubleshooting issues.

In the same manner, unauthorized VMs are also a type of sprawl, either because the VM is not supported or it's found running inappropriately. This could be a development VM on production hardware, or a low-priority application VM consuming resources on high-availability hardware.

4. Out of Inventory VMs (Invisible Inventory)
The decommissioning of virtual machines in a VMware environment is a two-step process: the VM must be removed from the VirtualCenter or vCenter inventory before it can be deleted from the storage array.

Should anything happen to prevent the second step from occurring (the administrator is pulled away to deal with an urgent situation before the process is complete, for example), then the VM image is no longer seen by the management software and consequently not counted or managed (it becomes invisible). But it is still present on disk. Which means that it is not only consuming valuable resources, but should the VM image contain protected data, this may also represent a potential compliance issue as it is not being monitored or managed by any datacenter system.

We have found this class of VM present even in highly controlled environments. One of our finance customers recognized this as an issue early on in its virtualization initiative and put in place "cast iron" processes to ensure that the "remove from inventory" and "delete from disk" commands were never separated. Even with these controls in place, we found that 3% of its VMs were in this state.

5. Resource Sprawl (Over-Provisioning)
In some cases, the quantity of VMs may be optimal, but the resources that they are using are not. For example, one of our customers had established a standard storage reservation of 32 GB for every VM.

Selecting a large disk reservation ensured that they would never have an outage caused by lack of storage, saved time when commissioning new virtual machines, and provided a standard for accounting purposes. But not every application or operating system needs this amount of space (in fact the majority of the VMs in this environment only used 4 to 6 GB on a regular basis), wasting valuable resources.

This may have been fine when the environment was small, but as the environment grew, the customer found itself consuming more and more storage when only a fraction of it was being used on a regular basis.

6. Excess Snapshots
Snapshots are a useful tool but can create significant problems if not controlled. Multiple snapshots do waste disk resources, but the bigger issue here is their impact on flexibility. Snapshots are meant to be used for testing apps installations or patches, providing a means of reverting back to the original if problems occur. They are not meant as a means of backup or other long-term advantage and can create issues with vMotion. In most cases, snapshots have to be merged or removed before a VM can be moved using vMotion. They can create real problems if you have too many around.

How Do You Know If You Have Sprawl?
There are a few primary signals that indicate a sprawled environment. The first and most obvious is that you will consume resources, hosts, memory and storage faster than you should, usually resulting in unplanned spending, in order to continue your growth.

Other indicators include extended troubleshooting times, difficulty in inventorying your VM environment, inconsistent VM configurations, and bottlenecks in provisioning. Bottom line: sprawl will create inefficient operations, making it difficult to maintain or accelerate your adoption of virtualization.

What Causes It?
According to the customers we surveyed, the real cause of virtual sprawl lies in a fundamental weakness in the virtualization management infrastructure. You can see these weaknesses by comparing the state of the virtualization management systems with those used to manage the physical datacenter. When you do this, you uncover three main deficiencies:

Lack of Insight / Reporting
The visibility and insight into multi-VirtualCenter/vCenter VMware environments leaves a lot to be desired. In fact, most administrative teams use spreadsheets to augment the information these systems provide and spend a lot of time stitching together reports and troubleshooting information.

Lack of Automation
There is very little automation provided in the VirtualCenter/vCenter management systems, which means that the day-to-day management of virtual server environments is a predominantly manual activity, which not only takes time, but presents challenges in terms of consistency.

Lack of Administration Time
Because of this lack of oversight of tools and automation, it's easy for virtual administrators to get immersed in the day-to-day tasks of keeping everything running and, by necessity, forgo the more proactive tasks of optimizing the environment.

The Sprawl Causal Loop
These three factors work together in a causal loop to create sprawl; while all three elements create sprawl, the first two issues - lack of insight and lack of automation - directly exacerbate the third issue, the lack of administration time.

Lack of insight/reporting in the environment means that administrators have to spend more time troubleshooting, or simply understanding the environment, reducing the amount of time they have available for proactive optimization, resulting in sprawl.

Lack of automation in the environment means that administrators have to spend time manually managing the environment, further reducing the amount of time they have for proactive optimization and increasing the potential for sprawl.

Increased sprawl in turn increases the need for management (simply by increasing the overall number of VMs in a given environment), as well as increasing overall troubleshooting time (more servers with which to deal, lack of standards and more shortcuts taken), further reducing the amount of time for proactive optimization and further increasing the potential for sprawl.

It's this feedback loop that makes sprawl both inevitable and difficult to manage. While environments are small, there may be time available to still do everything. But as environments grow, something has to give; firefighting takes over, and sprawl results.

This is not a problem that can be solved by attacking the symptom (sprawl). You may gain some temporary relief, but as the underlying cause hasn't changed, the symptom will simply continue to emerge.

Resolving Sprawl
To resolve this issue, you can either continually add staff to ensure that your administrators always have time to optimize the environment (which doesn't make financial or operational sense), or attack the cause of the problem and improve the insight and automation within your virtual server environments.

1. Improving Insight and Reporting
Getting a clear picture of what is happening within the virtual environment is essential, both for effective management as well as for effective troubleshooting. Unfortunately, neither VirtualCenter nor vCenter provide adequate insight and reporting capability here, and this problem is compounded if multiple VirtualCenters are used in the environment, as they don't talk to each other.

Some level of centralized or federated inventory management system is required that not only self-discovers VMs within the environment and includes effective reporting capabilities, but also traps business-level information (project, cost center, business owner, charge codes, expiry date, etc.), creating a single repository of VM information that places information at the fingertips of the administrators tasked with managing the environment.

Customers have informed us that once a system like this is in place, administrative workload can drop from 10 percent to 20 percent depending on how much report generation they were doing previously.

2. Improving Virtualization Automation
There is an assumption that automation systems are complex and time-consuming to install and configure and, indeed, some are. Fortunately, however, there are also out-of-the-box automation systems available today that make automation available without the need for a "forklift."

There are a lot of areas where automation can provide consistency and standardization while at the same time offloading the administrative team. These include provisioning systems, real-time discovery, automated decommissioning workflow, sprawl and out-of-process detection, as well as approval workflows and automatic alerting.

The key here is to automate the "grunt work" that your administrative team is doing, while at the same time providing automated alerting that allow your administrative team to "manage by exception" rather than by "effort." Equally as important is to select automation that can drop in without extensive training and setup, because if you are already in a sprawled condition, your administrative team simply does not have the available time to spend on extensive setups. (To see how Computacenter did this, download the case study at: http://www.embotics.com/case-study-computacenter.)

3. Freeing up Administration Time
Attacking the lack of information and lack of automation will allow your administrative team to manage by exception and free up their time to proactively manage and optimize the environment.

Centralizing all information about VMs in the environment, including the organizational and business-level information, allows administrators to save time troubleshooting, while providing better reporting to facilitate better levels of service and allow better decision making and capacity planning.

A centralized inventory of record combined with automatic sprawl and out-of-process identification allows administrators to quickly and easily identify sources of issues and deal with them before they impact customers.

Bottom Line
While the costs relating to virtual sprawl can be significant and can eat up your entire software license budget, consume more and more administrator time and eventually require the purchase of more physical servers and disks than you need, virtual sprawl itself is really only a symptom of a much bigger issue.

The primary cause of sprawl is the combination of a lack of insight and a lack of automation eating away at your administrators' time, forcing them into firefighting mode and away from operational optimization. And the situation gets worse as the environment grows.

Attacking the symptom may bring some immediate relief, but the underlying causes will ensure that sprawl returns, along with its associated costs and risks. The real solution is to attack the cause as well as the symptom, and get the immediate relief combined with the assurance that sprawl will not return.

At the same time you free up your administrative team for more value-add activities, reduce your ongoing support costs and improve the overall consistency of your environment.

References

More Stories By David M. Lynch

David M. Lynch is vice president of marketing for Embotics. He is a well-rounded 30-year veteran of the high-tech marketplace with extensive P&L and international expertise in service, hardware and software products. David holds degrees in nautical science, computer technology and an MBA in strategic marketing.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.