Five Practices To Optimize Vmware Capacity

  • Uploaded by: Derek E Weeks
  • 0
  • 0
  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Five Practices To Optimize Vmware Capacity as PDF for free.

More details

  • Words: 4,636
  • Pages: 15
Managing Capacity in VMware® Environments Part II – Five Practices that will Optimize your VMware Capacity and Result in a Standing Ovation May 2009

© 2009 Systar, Inc. http://www.systar.com/solutions/virtualization_management

1

Executive Summary Nearly every large enterprise IT shop has virtualized some portion of their infrastructure. By producing nearly 30% hardware cost savings annually, their IT executives are applauding the initial success. But, can they do better? While new virtualization platforms proved to be stable, and consolidation of physical resources was achieved during the initial deployments, the reality of the situation is that VMware infrastructure remains bloated. Systar has spoken with hundreds of IT executives, virtualization architects and capacity managers at leading enterprises who have all admitted to only achieving post-virtualization capacity utilization rates of 10 – 20%, while their objectives were to safely reach 50% - 60%. Reaching the >50% objective cannot be accomplished with default VMware settings, and requires a new understanding of virtualized capacity. In this paper, Systar explores five practices for optimizing the utilization of virtualized capacity without increasing risk to service quality. The practices are: • • • • •

Managing at the cluster level Sizing objects correctly Placing workloads carefully Optimizing DRS Minimizing impact of HA

As companies prepare to expand their virtualized environments, Systar sees an opportunity for IT organizations to improve their understanding of virtualized capacity, reduce new hardware spending significantly, and meet their utilization objectives safely. By applying the practices discussed in this paper, Systar sees the initial round of applause transforming into a standing ovation as virtualization expands across the enterprise while safely meeting the >50% objective.

© 2009 Systar, Inc. http://www.systar.com/solutions/virtualization_management

2

Executive Summary............................................................................................................. 2 Improve Quality and Contain Costs .................................................................................... 4 The Many Dimensions of VMware Capacity....................................................................... 4 Managing at the Cluster Level ............................................................................................ 5 Sizing Objects Correctly ...................................................................................................... 7 Placing Workloads............................................................................................................... 9 Optimizing DRS.................................................................................................................. 11 Minimizing Impact of High Availability ............................................................................. 12 Summary ........................................................................................................................... 13 Glossary............................................................................................................................. 14

© 2009 Systar, Inc. http://www.systar.com/solutions/virtualization_management

3

Improve Quality and Contain Costs The vast majority of today’s VMware infrastructure capacity is bloated. Where physical capacity was over-provisioned on average by a factor of 10, consolidation to virtualized environments has reduced the footprint of IT infrastructure, but not optimized computing capacity. On average, virtualized capacity is over-provisioned by a factor of 4 including peak headroom – representing millions of dollars in over-spending as corporate IT budgets continue to tighten. A proven way to combat over-provisioning and reduce unnecessary virtualization expenditures is through improved capacity management practices. According to a Forrester report titled The Capacity Planning Software Market1, “As data centers struggle with server consolidation and server virtualization, capacity planning becomes the key to maintaining or improving service quality while containing costs”. In theory, optimizing the utilization of IT infrastructure capacity via consolidation to virtualized environments is easy, but in practice it can be difficult without the right information. In order to maximize the utilization of virtualized infrastructure, the proper visibility, control, processes and expertise are needed. Yet each path to optimizing virtual capacity is met with its own challenge. For example, IT organizations must: • •

• • •

Maximize the aggregate utilization of virtualized resources, but respect the tolerance for risk of the organization Take advantage of the advanced capabilities of the virtualized infrastructure, but be able to quantify its available effective capacity, workload patterns, migration tendency and patterns, and HA capability Maintain the flexibility to respond to unexpected workload variances with virtual machine migrations, but strive to minimize migrations through careful placement Take advantage of cost-effective failover capabilities, but minimize capacity dedicated to support failovers Establish processes to automate the efforts mentioned above ensuring their burden does not outweigh their benefit as the virtualized environment scales

The Many Dimensions of VMware Capacity Virtualization changes many IT functions but it changes capacity management more than most. Where capacity in the physical world often focused on a single machine hosting a single application, VMware clusters – made up of ESX Hosts and Virtual Machines – are the new “computer” and capacity must be managed accordingly. VMware’s CTO, Steven Herrod, recently pointed to this notion when he commented that “virtualization is the mainframe for the 21st century”.2

1

The Capacity Planning Software Market: Sustaining Application Performance by Evelyn Hubbert and Jean-Pierre Garbani with Thomas Mendel, Ph.D. 2 VMware’s vSphere Introduction, Conference Call (2009) © 2009 Systar, Inc. http://www.systar.com/solutions/virtualization_management

4

Understanding capacity of VMware environments, starting at the cluster level, is a multidimensional concept. Cluster capacity is first determined by the size and number of ESX Hosts belonging to the cluster. Each ESX Host will support a number of VMs that have sizing properties like Reservations and calculated Entitlements. Each Reservation should account for the typical workload including some headroom for demand spikes. Next, we need to understand potential workload resource utilization contention on each ESX Host due to its resource sharing between all VMs in addition to the hypervisor and whitespace3. Then, as contention for resources builds, load balancing and policy enforcement performed by VMware’s Distributed Resource Scheduler (DRS) can migrate VMs from one host to another within the cluster. Finally and of equal importance, cluster capacity needs to account for headroom requirements of High Availability (if enabled). Although VMware capacity is complex in nature, optimizing capacity utilization in these environments does not have to be difficult. In fact, Systar has successfully led and enabled many organizations through optimizing their virtualized capacity. Based on many years of experience in optimizing physical and virtualized infrastructure, Systar recommends increasing your understanding and adoption of five proven practices: • • • • •

Managing at the cluster level Sizing objects correctly Placing workloads carefully Optimizing DRS Minimizing impact of HA

Let’s explore some of the multidimensional aspects of cluster capacity while discussing practices that can be applied to improve their management.

Managing at the Cluster Level During the initial rollout of VMware more concern is typically placed on sizing VMs correctly and placing workloads carefully, but the approach taken to achieve this is often rudimentary. We have spoken with many organizations that ignore proper VM sizing and workload placement methods in favor of placing common Operating Systems on the same machine and using a coreto-VM ratio rule. As deployments expand across the enterprise and DRS and HA enabled clusters enter into the equation, managing capacity of the cluster becomes paramount. This sentiment is echoed in Gartner, Inc.’s Data Center Conference Survey4, which states “In the long term, Gartner believes that capacity-planning tools and processes will have to shift their orientation to focusing less on a single VM or physical server to assist with the sizing of resource pools and clusters.”

3

Whitespace: capacity on a host that cannot be utilized due to alignment of resource requirements, or resources that cannot be used since they are too small support a whole VM. 4 Data Center Conference Survey: Addressing the Operational Challenges of Virtual Server Management, by Cameron Haight (February 2008) © 2009 Systar, Inc. http://www.systar.com/solutions/virtualization_management

5

Balancing workload among hosts is an important aspect of managing cluster capacity. If the load of an individual host is too high, a DRS enabled cluster will try to rebalance that load. Although this automated load balancing is available, it does create overhead in the environment. Simply relying on DRS to balance all workloads, all of the time, will lead to unnecessary migration overhead in the system and is not considered a best practice. By monitoring migrations and VM tracks, you can begin to determine if excess balancing is occurring or if VMs are wandering around like lost souls. With careful balancing of loads upfront, you can preempt capacity loss as a result of excess migrations. Next, we will look at Entitlements5. The general measure of the capacity and health of a cluster is the ability of the cluster to deliver the entitled resources to all VMs. If VM Entitlements total more than the total capacity of the cluster, the cluster is undersized or improperly balanced. Within the cluster itself, a good measure of a host’s ability to provide expected capacity is the measure of its total Entitlements vs. total capacity. Unlike Reservations6, the cluster and its hosts will not identify a violation when Entitlements exceed capacity. By understanding the sum of all VM Entitlements on a host and within a cluster, VMware architects and administrators will have a clear picture of the resources being made available to meet demands on their capacity. Another practice to consider is the calculation of target headroom7 within a cluster plus its high availability (HA) failover capacity. The sum of these elements can be used to establish an “effective capacity” of the cluster. The image below describes the effective capacity of the cluster in terms of the percentage of CPU and memory available.

Figure 1. Systar’s OmniVision VMware Capacity Cluster Reports automatically calculate effective capacity of clusters by percentage, normalized MIPS, and number of CPUs. Daily, weekly and monthly report perspectives are available.

5

Entitlements are the computed result of configurations, reservations, limits, and shares used to establish the resource allocation given to each VM for its operation. The Entitlement will always fall between the Reservation and the Limit, based upon its Share. 6 A Reservation is the amount of vCPU and memory (in absolute units) that a VM is guaranteed should it need it. 7 Target headroom = demand spike headroom (which depends on workload profiles and risk tolerance + whitespace (5% for large hosts, 10% for smaller ones); then, add in HA space of 15% per host in an 8-host cluster.

© 2009 Systar, Inc. http://www.systar.com/solutions/virtualization_management

6

Effective capacity is only one measure of the cluster that should be considered. Another recommended element of this practice is to not only measure the available capacity of the cluster in terms of resources like CPU and memory, but also in terms of VMs that can be added. It is important to understand that raw computing resource sizes do not take into account whitespace sizes (resources that cannot be used since they are too small support a whole VM). When calculating the number of VMs that can be added to a cluster, it is important to first define differing VM template sizes (e.g., small, medium and large). The definition can start with a simple average VM reservation for each resource per template size. For a more accurate picture of where VMs should be placed within a cluster, you can expand the calculation to consider maximum and minimum VM sizes and workload type such as sustained or peaky. The diagram below provides a calculated assessment of how many medium-sized VMs can be added to each cluster on the basis of CPU and memory requirements for that VM template.

Figure 2. Systar’s OmniVision VMware Capacity Cluster Reports have calculated that on the basis of CPU requirements 23 additional medium-sized VMs can be added to cluster “clsioub19”. The same cluster shows available memory for 53 new VMs. When determining the number of new VMs that can be hosted, it is recommended to use the lesser of the Effective CPU and memory metrics as a guide.

Sizing Objects Correctly The next practice we will explore is the importance of sizing objects correctly. The key to successful VMware capacity optimization is to set the Reservation property for each VM. The Reservation determines the amount of resources that a VM can receive before it begins competing with other VMs for the remaining shared resources. The Reservation also determines the size of the VMKernel swap file for the VM’s memory and impacts HA and DRS calculations. Reservations are used by VMware Admission Control to prevent placing too many VMs on a host based on resources. A VM can only be powered on if there are adequate unreserved resources available on the host to satisfy that VM’s Reservation requirement. If all VMs have the default Reservation setting of zero, then there is no effective Admission Control and VMs can be loaded © 2009 Systar, Inc. http://www.systar.com/solutions/virtualization_management

7

on hosts until overhead space finally runs out. This approach totally defeats Admission Control and makes it very hard for both the VMKernel and DRS to manage resources optimally. When HA is enabled, Reservations are used to calculate the amount of space (slots) needed to meet the Failover Level Policy in effect. HA will calculate the maximum of all Reservations and then based on the Failover Level Policy in effect, it will set aside space to provide the designated number of VMs with sufficient capacity to operate if trouble strikes. Without Reservations in place, HA must use an input parameter that defaults to 256MHz/256MB slots sizes, which is not optimal. Reservations are also used as one of the triggers for DRS VM migrations at Periodic Invocation time (along with load balancing and other mandatory moves). If the sum of the Reservations on a host exceeds its capacity then a VM is selected for migration to correct the situation. Now that we have established the importance of Reservations when managing capacity, we will provide some guidance on selecting its correct size. From our experience in assisting large organizations with their VMware environments, a common practice is to set the Reservation in the range of the 45th to 60th percentile for the VM’s historical resource utilization. Beyond this setting, allow the VM to compete with other VMs based on Shares and Entitlements during periods of greater load.

VM Workload Profiles

Reservation too low

Reservation low side

Reservation too high

Reservation high side

Figure 3. Systar’s OmniVision Workload Profile Reports display minimum (light blue), average (blue) and peak (orange) resource usage hour-by-hour for CPU, I/O and Memory. The measures are calculated from data collected every 15 seconds. Profiles are available of daily, multi-day, weekly and monthly views. Average and maximum usage profiles can be used to accurately assign VMware Reservations.

© 2009 Systar, Inc. http://www.systar.com/solutions/virtualization_management

8

Once you have established the Reservations and Shares (Entitlements are established indirectly based on total capacity, number of VMs, Limits and Shares) you need to track them against the VM’s resource utilization over time. Usage should stay below the Reservation 50-60% of the time and below the Entitlement 90% of the time or more. Where utilization is not matched well to these settings, you can improve service delivery by providing more resources or reclaim capacity by assigning fewer resources to the VM. Once the reservations have been set, there is an opportunity to set Shares that represent the VM’s business priority. Shares will determine resource Entitlement which the VMKernel will try to provide when needed (e.g., handling peak workload periods).

Placing Workloads The third practice to optimize your VMware capacity is placing workloads effectively. Capacity is determined to some extent by how well a workload behaves with other workloads in the shared resource environment. If a number of workloads peak at similar times within an ESX Host, the result may be degraded service or restricted capacity for other workloads needing access to the remaining resources. Multiple peaking workloads can be more troublesome if their behavior pattern is unpredictable, making them more challenging to manage. Peaky workloads like user-generated transactions must be studied more carefully to see which match up best for resource sharing. On the other hand, sustained workloads such as batch can be safely stacked, resulting in high capacity utilization, because their peak resource requirements are well known. It would make sense, for example, to place transaction workloads that peak during working hours with batch processes that run throughout the night to provide a balanced use of the resources available. To accomplish this type of safe-stacking requires a keen understanding of workload behaviors, including average and peak usage, over a period of time. Most experts would request monitoring of the workload behaviors for a minimum of one month. Of course, placement accuracy will increase by gathering additional workload data points over an extended period of time. A good rule is to produce a stacked chart of all workloads targeted for a resource (e.g., CPU, memory) belonging to the ESX Host. Flatter curves or sets of bars over time indicate a better workload fit. A highly variable curve or set of bars means that the peaks may be coinciding. Coinciding workloads increase the risk of resource contention, or wasted resources, and the need to separate those workloads on different hosts or clusters.

© 2009 Systar, Inc. http://www.systar.com/solutions/virtualization_management

9

Figure 4. Systar’s OmniVision Workload Profile Reports utilize pie and stacked bar charts to represent the combined behavior of multiple VMs running on a host. The stacked bar chart in the top-right corner shows a period between 12pm and 2pm where multiple workloads are peaking at the same time and contention for resources may be affecting quality of service as response times slow. Additionally, the chart shows that up to 6 CPUs are under-utilized 22 hours of the day.

Another benefit of careful workload placement is to minimize DRS migrations. Although DRS can automatically place a new VM and load balance existing ones, it only considers overall resource utilization of the hosts and not workload profiles. Therefore, if a VM is migrated to a host where workloads peak shortly after the move, DRS will trigger another migration. For example, the figure below shows a stacked bar chart of VM Memory Resource use on an ESX host. Each color represents memory use of a different VM. As you can see, memory use declines on the Host from 2am to 1pm and then peaks. DRS could easily migrate a memory intensive VM to this Host at 11am, but would then have to migrate it once again at 2pm if memory contention reaches an unacceptable level. When workloads like this are not placed effectively, it can result in continually wandering VMs.

Figure 5. Systar’s OmniVision shows a stacked bar of VM memory usage for an ESX Host over a one-day period. © 2009 Systar, Inc. http://www.systar.com/solutions/virtualization_management

10

When considering placement of critical applications, we suggest placing the workload manually in order to ensure the best fit with minimal migrations. Although the dynamic nature of VMware workload migrations theoretically provides greater availability for critical VMs, best practices recommend minimal shifting of these workloads. You may even want to set the VM to manual migration for very critical workloads. DRS migrations via vMotion are very efficient but still require some overhead. DRS migrations should not be approached haphazardly (e.g., stacking VMs at random and letting DRS determine how to best balance the workloads). Peaky workloads that are not placed effectively may result in excessive migrations, causing what has come to be known as “vMotion sickness”. In general, setting anti-affinity rules within VMware is not recommended, but can be very helpful for workloads that are variable and may peak simultaneously. Matching workload patterns is one of many considerations when stacking VMs on a host. Affinity rules, geographical constraints, organizational alignment, compliance issues and other factors will play into best practice guidelines for where workloads are permitted to be placed.

Optimizing DRS As a reminder from Part I of this white paper series, Distributed Resource Scheduling (DRS) provides a watchful eye over VMs in clustered environments. With an intention to provide each VM its required resources, DRS observes resource utilization on each host within a cluster. When unsatisfactory conditions are observed within one host, DRS assesses other hosts within the cluster where conditions may be more attractive. If DRS finds a suitable location, it then facilitates a VM move known as a VM migration. Our fourth practice points to the need to optimize DRS. Not all application environments need or are suited to its workload balancing features. For some sets of applications (VMs) it makes sense not to use DRS. For example, horizontally scaled applications like web servers are already load balanced. In other instances, the cluster may be hosting hundreds of tier 2, non-critical business applications that roughly demonstrate the same resource consumption. This environment may be best suited to setting DRS to “Auto” and the default level to “Aggressive” for all VMs. Auto settings allow for the initial placement of a VM inside the cluster to be automated and the automatic execution of migration recommendations. Aggressive migration thresholds will trigger movements that promise even a slight improvement in the cluster’s load balance. Where sets of workloads take on the opposite profile of the example above - becoming less homogeneous and more critical in nature - you will want to consider less aggressive migration thresholds and either partially-automated or manual placement and migration settings. The vCenter screen shot below shows a DRS enabled cluster with 4 ESX Hosts, of which only 2 are active. These systems are hosting 36 VMs and show 190 migrations. In this instance, workloads are clearly not balanced properly and excessive migrations are occurring. © 2009 Systar, Inc. http://www.systar.com/solutions/virtualization_management

11

Figure 6. VMware’s vCenter shows 4 hosts, of which 2 are active. In this case, 190 Migrations were observed. (Source: VMware)

As we discussed above in the Placing Workloads section, accomplishing safe-stacking requires a keen understanding of workload behaviors, including average and peak usage, over a period of time. The more variable the workload, the harder it is to determine proper placement for new VMs. If DRS migrations are occurring frequently, the workload profiles of new VMs will be difficult to match to existing VMs on a host. Based on our extensive experience in profiling workloads and recommending VM placement, the goal of the optimizing DRS practice is to minimize migrations and use the capability as a last resort.

Minimizing Impact of High Availability Our final practice is centered on high availability (HA) within DRS-enabled clusters. HA is a very cost-effective capability built into the DRS-enabled cluster. However, there is a tradeoff and its cost is twofold: • •

HA’s strict Admission Control is very conservative and wastes a great deal of capacity It is difficult to understand whether an application can be restarted in a given cluster state

The current method of calculating VMware’s HA failover capacity is complicated (and too lengthy to share here). However, in short, HA’s strict Admission Control uses the maximum Reservation size in the cluster as a slot size for all calculations. In fact, many users report seeing a message that there are “Insufficient resources to satisfy configured failover level for HA”, when attempting to configure their HA environment. Many sites we talk to limit resource utilization far below what they might need to restart the critical VMs on a host. And many of these same sites do not set their HA restart priority. We recommend setting and optimizing the restart priority around two points: minimizing capacity loss, and ensuring critical VMs restart immediately while low priority VMs restart when possible.

© 2009 Systar, Inc. http://www.systar.com/solutions/virtualization_management

12

In order to minimize capacity loss from poor HA configurations, we recommend the following approaches: • • •

Turn off strict HA Admission Control; VMware admits this is a very conservative approach (i.e., it wastes a lot of capacity). Set the restart priority of all VMs very carefully, usually according to the policies defined in your DR plan. Take the max of the sum of the Reservations for your High Priority VMs on any host. Subtract that sum plus an additional 5% for overhead from the aggregate capacity. Subtract the remainder from your required headroom which is based on the peakiness of the workloads and your risk tolerance. Manage the cluster to the remaining “effective capacity”.

VMware is planning to change its HA approach in ESX 4, and once we have sufficient experience with that release, this section of the paper will be revised.

Summary As your VMware environment continues to expand and the pressure to reduce costs continues to increase, applying the five practices recommended in this paper will provide greater control over new virtualization spending and improve the quality of services delivered. Systar is confident that by following these practices, your organization will be able to safely maximize the utilization of your VMware capacity above the 50% mark. Applying virtualization-aware management solutions, processes, and best practices is key to achieving results that deliver standing ovations. Optimizing capacity utilization is not only a great practice, but can result in substantial savings. According to a recent IDC report8, “an optimally managed or ‘advanced virtualization’ infrastructure (described as an infrastructure that includes penetration of virtualized servers of more than 25%, storage virtualization, and the use of systems management tools) can deliver a total [cost] reduction of up to 52% per user per year”. Your standing ovation awaits.

8

Business Value of Virtualization: Realizing the Benefit of Integrated Solutions, July 2008

© 2009 Systar, Inc. http://www.systar.com/solutions/virtualization_management

13

Glossary The concepts in this paper apply to most virtual environments – however, we use VMware VI 3.x to illustrate our points. •

Admission Control – if on, will not allow the power on of a new VM on a host if there is not enough unreserved resources available for the VM’s specified reservation



Workload – the basic unit of work, a Virtual Machine (VM).



Capacity – the aggregate capacity of a host or cluster.



Cluster – a group of hosts managed as an aggregate computing resource, VMware DRS-enabled cluster



Configured size – the number of vCPUs and the number of MB of memory that represent the size of the physical machine the VM is presented with.



Entitlement - the computed result of configurations, reservations, limits, and shares used to establish the resource allocation given to each VM for its operation. The Entitlement will always fall between the Reservation and the Limit, based upon its Share.



Effective capacity – the amount of that capacity that can be used given workload mix, high availability requirements and white space.



Host – a server that supports multiple workloads and the ESX server.



Limit – this property serves as a hard cap on resource allocation for a VM. If Limit is not specified, then the configuration size is the Limit.



Reservation – the amount of CPU MHz and MB of memory (in absolute units) that a VM is guaranteed should it need it.



Shared resources CPU and memory resources that are actively managed by the hypervisor



Shares – relative units that determine a VM’s priority among sibling VMs, used to determine resource allocation under contention.



White space – capacity on a host that cannot be utilized due to alignment of resource requirements, or resources that cannot be used since they are too small support a whole VM.

© 2009 Systar, Inc. http://www.systar.com/solutions/virtualization_management

14

About Systar Systar is a leading worldwide provider of performance management software. Systar’s OmniVision product suite enables customers to achieve the optimal alignment between IT resources and business requirements in both distributed and virtualized server environments. Systar’s proven capacity management solutions deliver the full benefits of virtualization by enabling customers to gain visibility into these complex environments, tune for optimal capacity, and move business-critical applications into production with full confidence.

United States 8618 Westwood Center Dr. Suite 240 Vienna, VA 22182 Tel. +1 703-556-8400 Fax +1 703-556-8430 [email protected] France 171 bureaux de la Colline 92213 Saint-Cloud Cedex Tel. +33 (0) 1 49 11 45 00 Fax +33 (0) 1 49 11 45 45 [email protected]

Germany Mergenthallerallee 79-81 D-65760 Eschborn Tel. +49 211 598 8520 [email protected] Spain Centro de Negocios Eisenhower C/ Cañada Real de las Merinas, 17 Edificio 5 - 1º D 28042 Madrid Tel. +34 91 747 88 64 Fax +34 91 747 54 35 [email protected]

United Kingdom Ground Floor Left 3 Dyer’s Buildings London EC1N 2JT Tel. +44 2072 692 799 Fax +44 2072 429 400 [email protected]

Systar, BusinessBridge, OmniVision, BusinessVision, ServiceVision, WideVision and Systar’s logo are registered trademarks of Systar. All other brand names, product names and trademarks are the property of their respective owners. Copyright 2009.

© 2009 Systar, Inc. http://www.systar.com/solutions/virtualization_management

15

Related Documents

How To Optimize Phoenix
August 2019 37
Vmware
November 2019 47
Vmware
May 2020 25
Vmware
August 2019 43

More Documents from ""