Monthly Archives: February 2015

VCAP DCD Study – Home Lab Design Part 10

Section 4 – Implementation Planning

Objective 4.1 – Create an Execute a Validation Plan

Knowledge
Recall standard functional test areas for design and operational verification.

Covered this in earlier sections but a recap!

Functional Requirements The official definition for a functional requirement specifies what the system should do: “A requirement specifies a function that a system or component must be able to perform.” Functional requirements specify specific behavior or functions, for example: “Display the heart rate, blood pressure and temperature of a patient connected to the patient monitor.”

Typical functional requirements are:

  • Business Rules
  • Transaction corrections, adjustments, cancellations
  • Administrative functions
  • Authentication
  • Authorization –functions user is delegated to perform
  • Audit Tracking
  • External Interfaces
  • Certification Requirements
  • Reporting Requirements
  • Historical Data
  • Legal or Regulatory Requirements

Non-Functional Requirements The official definition for a non-functional requirement specifies how the system should behave: “A non-functional requirement is a statement of how a system must behave, it is a constraint upon the systems behavior.”

Non-functional requirements specify all the remaining requirements not covered by the functional requirements. They specify criteria that judge the operation of a system, rather than specific behaviors, for example: “Display of the patient’s vital signs must respond to a change in the patient’s status within 2 seconds.”

Typical non-functional requirements are:

  • Performance – Response Time, Throughput, Utilization, Static Volumetric
  • Scalability
  • Capacity
  • Availability
  • Reliability
  • Recoverability
  • Maintainability
  • Serviceability
  • Security
  • Regulatory
  • Manageability
  • Environmental
  • Data Integrity
  • Usability
  • Interoperability

Non-functional requirements specify the system’s ‘quality characteristics’ or ‘quality attributes’. Potentially many different stakeholders have an interest in getting the non-functional requirements right. This is because for many large systems the people buying the system are completely different from those who are going to use it (customers and users)

 

Differentiate between operational testing and design verification.

Good operational testing examples can be found here..

https://communities.vmware.com/docs/DOC-11418

From Brownbag notes…

Operational Testing is testing pieces of the virtual infrastructure in general

Design Verification means implementing a business goal or requirement and verifying
its accuracy with the business, that the design item(s) perform as expected and, if so, a
ccepted by the business (i.e. meeting a Compliance requirement); this may or may not be
outside of standard implementation criteria.

 

Skills and Abilities

From an existing template, choose the appropriate test areas.

Example of a test template here..

http://www.vmware.com/files/pdf/partners/09Q1_VM_Test_Plan.doc

Test vSphere features (i.e. vMotion, HA, DRS) under certain workloads to see how apps
perform.

Identify expected results

Document the results from the test plans and compare them to the current state analysis done at the start of the project.

Demonstrate an ability to track results in an organized fashion

Use health check scripts and rvtools and document and present the results.

Compare validation plan metrics to demonstrate traceability to business objectives

Compare the results to the business objectives and requirements for validation.

 

Objective 4.2 – Create an Implementation Plan

Skills and Abilities

Based on key phases of enterprise vSphere 5.x implementations, map customer development needs to a standard implementation plan template.

VMware provide a plan and design kit to partners, basically they are saying that although it is a useful tool we shouldn’t stick to it to the letter, take in to account your own business requirements and make sure the design fits them.

Evaluate customer implementation requirements and provide a customized implementation plan.

Not really sure what to say here but create an implementation that meets the customers needs.

Incorporate customer objectives into a phased implementation schedule.

Phased implementation focus areas:
Early ROI workloads
Low risk/high visibility
roi1

Match customer skills and abilities to implementation resource requirements.

The key roles for the team are listed below.

  • Relationship Manager – Act as primary interface between application owners and infrastructure groups.
  • IT Analyst – Identify impacted operational areas and recommend changes.
  • IT Infrastructure Architect – Translate requirements into architectural designs.
  • IT Infrastructure Engineer – Provide specific technical design for virtualized solutions.

The size of the team will vary depending on the scope and size of deployments, but it can be as small as three people or larger where multiple people are acting in each role. These positions should be viewed as relatively senior positions for highly regarded and skilled employees. Suitable candidates can often be found in the current organization (for example, in relationship management, IT infrastructure architecture, or server engineering groups). Once the team is in place, the team members play a central role in the deployment of projects in a virtualized environment.

Identify and correct implementation plan gaps.

Basically provide the finer detail of the implementation plans e.g. configure vswitch security settings.

Objective 4.3 – Create an Installation Guide

Knowledge

Identify standard resources required to construct an installation guide.

Use the official VMware documentation to construct the installation guides, also refer to the VMware community.
Skills and Abilities
Consider multiple product installation dependencies to create a validated configuration.

Ensure the installation guide follows a logical flow so that components are installed in the correct order.
Recognize opportunities to utilize automated procedures to optimize installation.

Auto-deploy springs to mind, nothing stopping you from using good old linux based kickstart to do a scripted installation.
Create installation documentation specific to the design.

Create a step by step installation doc, use screenshots to assist the engineer installing the components.

 

VCAP DCD Study – Home Lab Design Part 9

Objective 3.6 – Determine Data Center Management Options for a vSphere 5.x Physical Design

Knowledge

1. Differentiate and describe client access options.

  • vSphere Client
  • vSphere Web Client
  • vCLI
  • PowerCLI
  • DCUI
  • vMA

Skills and Abilities

2. Based on the service catalog and given functional requirements, for each service:
o Determine the most appropriate datacenter management options for the design.

 Management tools will depend on the skills on the operational staff running the infrastructure and will usually be decided on this basis.

o Implement the service based on the required infrastructure qualities.

Not much to say about this, but management tools should be implemented following AMPRS!

3. Analyze cluster availability requirements for HA and FT.

No brainer really.  HA should always be enabled, although I have come across a situation where we couldn’t enable it due Cisco contact centre software not supporting HA and VMotion but I would say this is a real exception.

FT will have specific use cases depending on requirements, the current vCPU limit restricts it’s usefulness but as mentioned earlier this will soon be a thing of the past. FT VMs cannot have snapshots, DRS or Storage vMotion.

Analyze cluster performance requirements for DRS and vMotion.

Be aware of  VM hardware versions,  virtual machines running on hardware version 8 can’t run on prior versions of ESX/ESXi, such virtual machines can be moved using VMware vMotion only to other ESXi 5.0 hosts.  Take into account CPU compatibility, try and keep the hardware exactly the same, if not possible then enable EVC on the cluster.

Analyze cluster storage performance requirements for SDRS and Storage vMotion.

Storage vMotion can perform up to four simultaneous disk copies per Storage vMotion operation. Storage vMotion will involve each datastore in no more than one disk copy at any one time, however. This means, for example, that moving four VMDK files from datastore A to datastore B will happen serially, but moving four VMDK files from datastores A, B, C, and D to datastores E, F, G, and H will happen in parallel.

For performance-critical Storage vMotion operations involving virtual machines with multiple VMDK files, you can use anti-affinity rules to spread the VMDK files across multiple datastores, thus ensuring simultaneous disk copies.
„
During a Storage vMotion operation, the benefits of moving to a faster data store will be seen only when the migration has completed. However, the impact of moving to a slower data store will gradually be felt as the migration progresses.
„
Storage vMotion will often have significantly better performance on VAAI-capable storage arrays.

VMware Storage vMotion performance depends strongly on the available storage infrastructure bandwidth between the  ESXi host where the virtua l machine is running and both the source and destination data stores.

During a Storage vMotion operation the virtual disk to be moved is being read from the source data store and written to the destination data store. At the same time the virtual machine continues to read from and write to the source data store while also writing to the destination data store. This additional traffic takes place on storage that might also have other I/O loads (from other virtual machines on the same ESXi host or from other hosts) that can further reduce the available bandwidth.

 

Determine the appropriate vCenter Server design and sizing requirements:
o vCenter Server Linked Mode

Using vCenter Server in Linked Mode You can join multiple vCenter Server systems using vCenter Linked Mode to allow them to share information. When a server is connected to other vCenter Server systems using Linked Mode, you can connect to that vCenter Server system and view and manage the inventories of the linked vCenter Server systems.Linked Mode uses Microsoft Active Directory Application Mode (ADAM) to store and synchronize data across multiple vCenter Server systems. ADAM is installed as part of vCenter Server installation. Each ADAM instance stores data from the vCenter Server systems in the group, including information about roles andlicenses. This information is replicated across all of the ADAM instances in the connected group to keep them in sync.

When vCenter Server systems are connected in Linked Mode, you can perform the following actions:

  • Log in simultaneously to vCenter Server systems for which you have valid credentials.
  • Search the inventories of the vCenter Server systems in the group.
  • View the inventories of the vCenter Server systems in the group in a single inventory view. So if you have multiple vCenter instances to manage different sites, for site recovery or just different locations, then vCenter Linked mode will help out with managing of all the different sites under one location

o vCenter Server Virtual Appliance

  • vCenter Linked Mode is not supported
  • vCenter Heartbeat is not supported
  • Some VMware/Third Party Plugins might not support vCSA. Check with your desired plugin vendors if they support the vCenter Appliance.
  • Installing update Manager on the vCenter Appliance is not supported, but you can still set it up on a separate Windows VM.
  • If using the embedded database you will be limited to 100 hosts and 3000 VMs, but you always can utilize an Oracle Database to be able to scale to the vCenter Maximums of 1000 hosts and 10,000 VMs.
  • MS SQL Database is currently not supported by the vCenter Server Appliance, where you can either use the built-in vPostgres (Support up to 100 hosts and 3000VMs) or you will need to use Oracle Database to scale to 1000 hosts and 10,000 VMs. If you are planning to go beyond 100 hosts and 3000VMs and Oracle database is not an option or your cup of tea then you will have to stick with the Windows version of vCenter for now.
  • It does not support the Security Support Provider Interface (SSPI),  which is a part of SSO, and  is a Microsoft Windows API used to perform authentication against NTLM or Kerberos.
  • VMware View Composer can not be installed on the vCenter appliance, but it is no longer required to install it on the same machine as vCenter and it can be installed on a different machine and then it will support vCSA.

 
o vCenter Server Heartbeat

vCenter Server Heartbeat is a Windows based service specifically designed to provide high availability protection for vCenter Server configurations without requiring any specialized hardware.

vCenter Server Heartbeat provides the following protection levels:

Server Protection – vCenter Server Heartbeat provides continuous availability to end users through a hardware failure scenario or operating system crash.
Additionally, vCenter Server Heartbeat protects the network identity of the production
server, ensuring users are provided with a replica server including server name and IP
address shares on the failure of the production server.

Network Protection –

vCenter Server Heartbeat proactively monitors the network by
polling up to three nodes to ensure that the active server is visible on the network.

Application Protection –

vCenter Server Heartbeat maintains the application environment
ensuring that applications and services stay alive on the network.

Performance Protection –

vCenter Server Heartbeat proactively monitors system
performance attributes to ensure that the system administrator is notified of
problems and can take pre-emptive action to prevent an outage.

Data Protection –

vCenter Server Heartbeat intercepts all data written by users
and applications, and maintains a copy of this data on the passive server that can
be used in the event of a failure.

vCenter Server Heartbeat provides all five protection levels continuously, ensuring
all facets of the user environment are maintained at all times, and that the network
(Principal (Public) network) continues to operate through as many failure scenarios as possible.
vCenter Server Heartbeat software is installed on a Primary server and a Secondary server.
These names refer to the physical hardware (identity) of the servers.
The Secondary server has the same domain name, same file and data structure, same network  address, and can run all the same applications an d services as the Primary server.
vCenter Server Heartbeat uses two servers with identical names and IP addresses. 

One is an active server that is visible on the Principal (Public) network and the other is a passive server that is hidden from the network but remains as a ready standby server.

Only one server name and IP address can be visible on the Principal (Public) network at any given time.

Determine appropriate access control settings, create roles and assign users to roles.

Covered on objective 2.7

Based on the logical design, identify and implement asset and configuration management technologies.

I would say that VMware are filling this space with vCAC or now referred to as vRealize Automation, it’s a huge subject way beyond the scope of my study notes.  Other products are VMware GO, VMware service manager and VMware configuration manager.

Determine appropriate host and virtual machine deployment options.

Auto Deploy more suited to larger environments that require a more agile method of host deployment. Full install methods include boot from SAN, boot from ISCSI and scripted installs using powercli or linux kickstart(basically what auto-deploy uses), use image builder to customise ESXi images.

For virtual machines they can be created from templates, P2V, V2V or you can PXE boot the VM.

Based on the logical design, identify and implement release management technologies, such as Update Manager.

Taken from the Update Manager performance and best practice document

VMware vCenter™ Update Manager (also known as VUM) provides a patch management framework for VMware vSphere®. IT administrators can use it to patch and upgrade:

  • VMware ESX and VMware ESXi™ hosts
  • VMware Tools and virtual hardware for virtual machines
  • Virtual appliances.

… …

Update Manager Server Host Deployment There are three Update Manager server host deployment models where:

  • Model 1 – vCenter Server and the Update Manager server share both a host and a database instance.
  • Model 2 –  Recommended for data centers with more than 300 virtual machines or 30 ESX/ESXi hosts. In this model, the vCenter server and the Update Manager server still share a host, but use separate database instances.
  • Model 3 – Recommended for data centers with more than 1,000 virtual machines or 100 ESX/ESXi hosts. In this model, the vCenter server and the Update Manager server run on different hosts, each with its own database instance.

… …

Performance Tips

  • Separate the Update Manager database from the vCenter database when there are 300+ virtual machines or 30+ hosts.
  • Separate both the Update Manager server and the Update Manager database from the vCenter Server system and the vCenter Server database when there are 1000+ virtual machines or 100+ hosts.
  • Make sure the Update Manager server host has at least 2GB of RAM to cache frequently used patch files in memory.
  • Allocate separate physical disks for the Update Manager patch store and the Update Manager database.

Based on the logical design identify and implement event, incident and problem management technologies.

 Borrowed from BrownBag notes.

Traditionally, approaches to each have been reactive, being proactive allows for: efficiency, agility, reliability

Need automation tools, intelligent analytics

Tools  -VMware Service Manager; vCenter Orchestrator;

http://www.vmware.com/files/pdf/services/VMware-Proactive-Incident-Whitepaper.pdf

Based on the logical design, identify and implement logging, monitoring and reporting technologies.

Most widely used ‘system’ is Alarms within vCenter, be aware if vCenter fails then you have no alerting, so also use SNMP.

Events – record of user or system actions in vCenter

Alarms – notifcations activated in response to events

Monitoring – can be done using SNMP traps, SNMP agent is embedded in ‘hostd’

Logging– best to setup a logging server; product called “Syslog Collector” can be used

Install with vCenter Server media; point to log server

VCAP DCD Study – Home Lab Design Part 8

Objective 3.5 – Determine Virtual Machine Configuration for a vSphere 5.x Physical Design
Knowledge

 

1.Describe the applicability of using an RDM or a virtual disk for a given VM.

RDMs

Only use when necessary, i.e. Microsoft Clustering, SAN agents that require direct access and for migrations, there is very little performance difference between and RDM and VMFS.

Skills and Abilities

2. Based on the service catalog and given functional requirements, for each service: Determine the most appropriate virtual machine configuration for the design.

o Implement the service based on the required infrastructure qualities.

  • Always start with only 1 vCPU
  • Enable TPS
  • Always install VMware Tools
  • Only allocate RAM needed
  • Align virtual disks
  • Remove Floppy and any unneeded I/O devices or VM Hardware
  • Paravirtual SCSI for Data disks (not OS); typically use for > 2000 IOPS
  • VMXNET3 Ethernet Adapters
  • If redirecting VM swap files, do so on Shared Storage for better vMotion performance

3. Based on an existing logical design, determine appropriate virtual disk type and placement.

 

  • Thick Provision Lazy Zeroed Creates a virtual disk in a default thick format. Space required for the virtual disk is allocated when the virtual disk is created. Data remaining on the physical device is not erased during creation, but is zeroed out on demand at a later time on first write from the virtual machine. Using the default flat virtual disk format does not zero out or eliminate the possibility of recovering deleted files or restoring old data that might be present on this allocated space. You cannot convert a flat disk to a thin disk.
  • Thick Provision Eager Zeroed A type of thick virtual disk that supports clustering features such as Fault Tolerance. Space required for the virtual disk is allocated at creation time. In contrast to the flat format, the data remaining on the physical device is zeroed out when the virtual disk is created. It might take much longer to create disks in this format than to create other types of disks.
  • Thin Provision Use this format to save storage space. For the thin disk, you provision as much datastore space as the disk would require based on the value that you enter for the disk size. However, the thin disk starts small and at first, uses only as much datastore space as the disk needs for its initial operations. NOTE If a virtual disk supports clustering solutions such as Fault Tolerance, do not make the disk thin. If the thin disk needs more space later, it can grow to its maximum capacity and occupy the entire datastore space provisioned to it. Also, you can manually convert the thin disk into a thick disk.

 

4. Size VMs appropriately according to application requirements, incorporating VMware best practices.

BrownBag notes again!

  • Start with 1 vCPU and only allocate RAM required by ISVs(Independent Software vendors)  for a given application
  • For storage, you can get this by current state analysis, then add enough for growth (patches/updates), vswp, logging, other ‘overhead’ (avg size of VMs * # VMs on Datastore) + 20% = round up the final number
  • Size VM resources in accordance with NUMA boundaries. So, if you have 4 cores, assign vCPUs by multiple of 4, 6 cores = multiple of 6, etc.
  • If overallocate RAM, more RAM overhead is used per VM
    thus wasting RAM…for larger environments that is more applicable

5. Determine appropriate reservations, shares, and limits.

Shares,Reservations, and Limits:

  • Deploy VMs with default setting unless clear reason to do otherwise
  • Use sparingly if at all!
  • Are there Apps that need resources even during contention? Then use Reservations
  • This adds complexity and administration overhead.

6. Based on an existing logical design, determine virtual hardware options.

From the performance best practice doc.

Allocate to each virtual machine only as much virtual hardware as that
virtual machine requires.

Provisioning a virtual machine with more resources than it requires can, in some cases,
reduce the performance of that virtual machine as well as other virtual machines
sharing the same host.

Disconnect or disable any physical hardware devices that you will not be using. These might include

devices such as:
„
COM ports
„
LPT ports
„
USB controllers
„
Floppy drives
„
Optical drives (that is, CD or DVD drives)
„
Network interfaces
„
Storage controllers

Disabling hardware devices (typically done in BIOS ) can free interrupt resources. Additionally, some devices, such as USB controllers, operate on a polling scheme that consumes extra CPU resources. Lastly, some PCI devices reserve blocks of memory,making that memory unavailable to ESXi.

„
Unused or unnecessary virtual hardware devices can impact performance and should be disabled. For example, Windows guest operating systems poll optical drives (that is, CD or DVD drives) quite frequently. When virtual machines are configured to use a physical drive, and multiple guest operating systems simultaneously try to access that drive, performance could suffer. This can be reduced by configuring the virtual machines to use ISO images instead of physical drives, and can be avoided entirely by disabling optical drives in virtual machines when the devices are not needed.

„
ESXi 5.5 introduces virtual hardware version 10. By creating virtual machines using this hardware version, or upgrading existing virtual machines to this version, a number of additional capabilities become available. This hardware version is not compatible with versions of ESXi prior to 5.5, however, and thus if a cluster of ESXi hosts will contain some hosts running pre-5.5 versions of ESXi, the virtual machines running on hardware version 10 will be constrained to run only on the ESXi 5.5 hosts. This could limit vMotion choices for Distributed Resource Scheduling (DRS) or Distributed Power Management (DPM)

7. Design a vApp catalog of appropriate VM offerings (e.g., templates, OVFs, vCO).

Useful for packaging applications that have dependencies, can be converted to OVF and exported.

8. Describe implications of and apply appropriate use cases for vApps.

Simplified deployment of an application for developers, can be re-packaged and converted to OVF at each stage of the SDLC.

9. Decide on the suitability of using FT or 3rd party clustering products based on application requirements.

Currently limited to 1 vCPU, but…. vSphere 6.0 announced this week so support for up to 4 vCPUs is here!!! Awesome! We’ll be seeing a lot more use cases…

From Performance best practice doc.

FT virtual machines that receive large amounts of network traffic or perform lots
of disk reads can create significant bandwidth on the NIC specified for the logging
traffic. This is true of machines that routinely do these things as well as machines doing
them only intermittently, such as during a backup operation. To avoid saturating the
network link used for logging traffic limit the number of FT virtual machines on each
host or limit disk read bandwidth and network receive band width of those virtual machines.

Make sure the FT logging traffic is carried by at least a Gigabit-rated NIC (which should in turn be connected to at least Gigabit-rated network infrastructure).

NOTE: Turning on FT for a powered-on virtual machine will also automatically “Enable FT” for that virtual machine.

Avoid placing more than four FT-enabled virtual machines on a single host. In addition to reducing the possibility of saturating the network link used for logging traffic, this also limits the number of simultaneous live-migrations needed to create new secondary virtual machines in the event of a host failure.
„
If the secondary virtual machine lags too far behind the primary (which usually happens when the primary virtual machine is CPU bound and the secondary virtual machine is not getting enough CPU cycles), the hypervisor might slow the primary to allow the secondary to catch up. The following recommendations help avoid this situation:
„
Make sure the hosts on which the primary and secondary virtual machines run are relatively closely matched, with similar CPU make, model, and frequency.
Make sure that power managementscheme settings (both in the BIOS and in ESXi) that cause CPU frequency scaling are consistent between the hosts on which the primary and secondary virtual machines run.
„
Enable CPU reservations for the primary virtual machine (which will be duplicated for the secondary virtual machine) to ensure that the secondary gets CPU cycles when it requires them.

 

10. Determine and implement an anti-virus solution

 Basically referring to vShield endpoint, there are many AV products and choosing one will come down to the requirements.

 

VCAP DCD Study – Home Lab Design Part 7

Objective 3.4 – Determine Appropriate Compute Resources for a vSphere 5.x Physical Design

Knowledge

1. Describe best practices with respect to CPU family choices.

Best practice is to stick to identical hardware across clusters, if this is not possible then EVC mode can be enabled but remember vCenter will create a baseline which may limit some features if the CPU being added to the cluster is newer than the existing CPUs, see VMware KB 1003212 for more info.

Skills and Abilities

2. Based on the service catalog and given functional requirements, for each service:

  • Determine the most appropriate compute technologies for the design.
  • Implement the service based on the required infrastructure qualities.

 

  • AMD – Vi (IOMMU) or Intel VT-d CPUs for direct I/O compatibiltiy
  • Be careful when using CPU affinity on systems with hyper-threading. Because the two logical processors share most of the processor resources, pinning vCPUs, whether from different virtual machines or from a single SMP virtual machine, to both logical processors on one core (CPUs 0 and 1, for example) could cause poor performance.

General BIOS Settings

  • Make sure you are running the latest version of the BIOS available for your system.
  • Make sure the BIOS is set to enable all populated processor sockets and to enable all cores in each socket.
  • Enable “Turbo Boost” in the BIOS if your processors support it.
  • Make sure hyper-threading is enabled in the BIOS for processors that support it.
  • Some NUMA-capable systems provide an option in the BIOS to disable NUMA by enabling node interleaving. In most cases you will get the best performance by disabling node interleaving (in other words, leaving NUMA enabled).
  • Make sure any hardware-assisted virtualization features (VT-x, AMD-V, EPT, RVI, and so on) are enabledin the BIOS.
  • Disable from within the BIOS any devices you won’t be using. This might include, for example, unneeded serial, USB, or network ports.
  • Cache prefetching mechanisms (sometimes called DPL Prefetch, Hardware Prefetcher, L2 Streaming Prefetch, or Adjacent Cache Line Prefetch) usually help performance, especially when memory access patterns are regular. When running applications that access memory randomly, however, disabling these mechanisms might result in improved performance.„
  • If the BIOS allows the memory scrubbing rate to be configured, we recommend leaving it at the manufacturer’s default setting

3. Explain the impact of a technical design on the choice of server density:

  • Scale Up
  • Scale Out
  • Auto Deploy

Scale Up

Few large servers, bigger impact if there is a failure, less management, cooling, power and heating.

Scale Out

Smaller servers, fewer VMs impacted when there is failure, scaling is more agile.

Auto Deploy

Suitable for large enterprises, requires extra infrastructure and can be more complex to manage, dependency on vCenter so needs extra consideration when designing vCenter availability.

Blade vs Server

Blades

  • less space; less I/O slots; less RAM slots; non scalable cost; more
    heating/cooling cost; vendor lockin; simpler cabling; shared chassis (SPOF?); expertise
  • Rack = more I/O & RAM slots; take up more space.

 

4. Determine a consolidation ratio based upon capacity analysis data.

Again, some of the notes borrowed from the BrownBag DCD pdf, they do a nice job of making the notes concise.

  • Cores per CPU: The number of cores per host must match or exceed the number of vCPUs of the Largest VM
  • Depending on load typically you can run 4 to 6 VMs per core on quad core socket processor
  • During current state analysis, determine total CPU and RAM required, then divide that out to the # of hosts required to meet the CPU/RAM requirement.
    This will also be based on budget,as well as what the compute ‘density’ requirement is (i.e. scale up or scale out approach), if redundancy is one of the main requirements, then a scale out approach would be better.

 

5. Calculate the number of nodes in an HA cluster based upon host failure count and resource guarantees.

Taken from the vSphere Availability PDF.

The following recommendations are best practices for vSphere HA admission control.

  • Select the Percentage of Cluster Resources Reserved admission control policy. This policy offers the most flexibility in terms of host and virtual machine sizing. When configuring this policy, choose a percentage for CPU and memory that reflects the number of host failures you want to support. For example, if you want vSphere HA to set aside resources for two host failures and have ten hosts of equal capacity in the cluster, then specify 20% (2/10).
  • Ensure that you size all cluster hosts equally. For the Host Failures Cluster Tolerates policy, an
    unbalanced cluster results in excess capacity being reserved to handle failures because vSphere HA reserves capacity for the largest hosts. For the Percentage of Cluster Resources Policy, an unbalanced cluster requires that you specify larger percentages than would otherwise be necessary to reserve enough capacity for the anticipated number of host failures.
  • If you plan to use the Host Failures Cluster Tolerates policy, try to keep virtual machine sizing requirements similar across all configured virtual machines. This policy uses slot sizes to calculate the amount of capacity needed to reserve for each virtual machine. The slot size is based on the largest reserved memory and CPU needed for any virtual machine. When you mix virtual machines of different CPU and memory requirements, the slot size calculation defaults to the largest possible, which limits consolidation.
  • If you plan to use the Specify Failover Hosts policy, decide how many host failures to support and then specify this number of hosts as failover hosts. If the cluster is unbalanced, the designated failover hosts should be at least the same size as the non-failover hosts in your cluster. This ensures that there isadequate capacity in case of failure.

Example: Admission Control Using Percentage of Cluster Resources Reserved Policy The way that Current Failover Capacity is calculated and used with this admission control policy is shown with an example. Make the following assumptions about a cluster:

The cluster is comprised of three hosts, each with a different amount of available CPU and memory resources. The first host (H1) has 9GHz of available CPU resources and 9GB of available memory, while Host 2 (H2) has 9GHz and 6GB and Host 3 (H3) has 6GHz and 6GB.
There are five powered-on virtual machines in the cluster with differing CPU and memory requirements. VM1 needs 2GHz of CPU resources and 1GB of memory, while VM2 needs 2GHz and 1GB, VM3 needs 1GHz and 2GB, VM4 needs 1GHz and 1GB, and VM5 needs 1GHz and 1GB.
The Configured Failover Capacity is set to 25%.

hosts tolerate

The total resource requirements for the powered-on virtual machines is 7GHz and 6GB. The total host resources available for virtual machines is 24GHz and 21GB. Based on this, the Current CPU Failover Capacity is 70% ((24GHz – 7GHz)/24GHz). Similarly, the Current Memory Failover Capacity is 71% ((21GB-6GB)/21GB).Because the cluster’s Configured Failover Capacity is set to 25%, 45% of the cluster’s total CPU resources and 46% of the cluster’s memory resources are still available to power on additional virtual machines.

 

6. Explain the implications of using reservations, limits, and shares on the physical design.

Shares

Resource Allocation Shares Shares specify the relative importance of a virtual machine (or resource pool). If a virtual machine has twice as many shares of a resource as another virtual machine, it is entitled to consume twice as much of that resource when these two virtual machines are competing for resources.

Shares are typically specified as High, Normal, or Low and these values specify share values with a 4:2:1 ratio, respectively. You can also select Custom to assign a specific number of shares (which expresses a proportional weight) to each virtual machine.

Specifying shares makes sense only with regard to sibling virtual machines or resource pools, that is, virtual machines or resource pools with the same parent in the resource pool hierarchy. Siblings share resources according to their relative share values, bounded by the reservation and limit. When you assign shares to a virtual machine, you always specify the priority for that virtual machine relative to other powered-on virtual machines.

The following table shows the default CPU and memory share values for a virtual machine. For resource pools, the default CPU and memory share values are the same, but must be multiplied as if the resource pool were a virtual machine with four virtual CPUs and 16 GB of memory.

share_values

For example, an SMP virtual machine with two virtual CPUs and 1GB RAM with CPU and memory shares set to Normal has 2×1000=2000 shares of CPU and 10×1024=10240 shares of memory.

NOTE Virtual machines with more than one virtual CPU are called SMP (symmetric multiprocessing) virtual machines. ESXi supports up to 32 virtual CPUs per virtual machine.

The relative priority represented by each share changes when a new virtual machine is powered on. This affects all virtual machines in the same resource pool. All of the virtual machines have the same number of virtual CPUs. Consider the following examples.

  • Two CPU-bound virtual machines run on a host with 8GHz of aggregate CPU capacity. Their CPU shares are set to Normal and get 4GHz each.
  • A third CPU-bound virtual machine is powered on. Its CPU shares value is set to High, which means it should have twice as many shares as the machines set to Normal. The new virtual machine receives 4GHz and the two other machines get only 2GHz each. The same result occurs if the user specifies a custom share value of 2000 for the third virtual machine.

Limits

Limit specifies an upper bound for CPU, memory, or storage I/O resources that can be allocated to a virtual machine.

A server can allocate more than the reservation to a virtual machine, but never allocates more than the limit, even if there are unused resources on the system. The limit is expressed in concrete units (megahertz, megabytes, or I/O operations per second).

CPU, memory, and storage I/O resource limits default to unlimited. When the memory limit is unlimited, the amount of memory configured for the virtual machine when it was created becomes its effective limit.

In most cases, it is not necessary to specify a limit. There are benefits and drawbacks:

  • Benefits – Assigning a limit is useful if you start with a small number of virtual machines and want to manage user expectations. Performance deteriorates as you add more virtual machines. You can simulate having fewer resources available by specifying a limit.
  • Drawbacks – You might waste idle resources if you specify a limit. The system does not allow virtual machines to use more resources than the limit, even when the system is underutilized and idle resources are available. Specify the limit only if you have good reasons for doing so.

Reservations

A reservation specifies the guaranteed minimum allocation for a virtual machine.

vCenter Server or ESXi allows you to power on a virtual machine only if there are enough unreserved resources to satisfy the reservation of the virtual machine. The server guarantees that amount even when the physical server is heavily loaded. The reservation is expressed in concrete units (megahertz or megabytes).

For example, assume you have 2GHz available and specify a reservation of 1GHz for VM1 and 1GHz for VM2. Now each virtual machine is guaranteed to get 1GHz if it needs it. However, if VM1 is using only 500MHz, VM2 can use 1.5GHz. Reservation defaults to 0. You can specify a reservation if you need to guarantee that the minimum required amounts of CPU or memory are always available for the virtual machine.

7. Specify the resource pool and vApp configuration based upon resource requirements.

Resource Pools

Taken from vSphere Resource Management 5.5 PDF

A resource pool is a logical abstraction for flexible management of resources. Resource pools can be grouped into hierarchies and used to hierarchically partition available CPU and memory resources.
Each standalone host and each DRS cluster has an (invisible) root resource pool that groups the resources of that host or cluster. The root resource pool does not appear because the resources of the host (or cluster) and the root resource pool are always the same.
Users can create child resource pools of the root resource pool or of any user-created child resource pool. Each child resource pool owns some of the parent’s resources and can, in turn, have a hierarchy of child resource pools to represent successively smaller units of computational capability.
A resource pool can contain child resource pools, virtual machines, or both. You can create a hierarchy of shared resources. The resource pools at a higher level are called parent resource pools. Resource pools and virtual machines that are at the same level are called siblings. The cluster itself represents the root resource pool. If you do not create child resource pools, only the root resource pools exist.
In the following example, RP-QA is the parent resource pool for RP-QA-UI. RP-Marketing and RP-QA are siblings. The three virtual machines immediately below RP-Marketing are also siblings.

rp

For each resource pool, you specify reservation, limit, shares, and whether the reservation should be expandable. The resource pool resources are then available to child resource pools and virtual machines.
VAPP
A vSphere vApp allows packaging of multiple interoperating virtual machines and software applications that you can manage as a unit and distribute in OVF format.
A vApp can contain one or more virtual machines, but any operation carried out on the vApp, such as clone or power off, affects all virtual machines in the vApp container,
From the vSphere Web Client, you can access the vApp summary page with the current status of the vApp, and you can manage the vApp.
NOTE: Because the vApp metadata resides in the vCenter Server database, a vApp can be distributed across multiple ESXi hosts. This information can be lost if the vCenter Server database is cleared or if a standalone ESXi host that contains a vApp is removed from vCenter Server. Back up your vApps to an OVF package to avoid losing metadata.
vApp metadata for virtual machines within a vApp do not follow the snapshots semantics for virtual machine configuration. vApp properties that are deleted, modified, or defined after a snapshot is taken remain intact (deleted, modified, or defined) after the virtual machine reverts to that snapshot or any prior snapshots.
You can use VMware Studio to automate the creation of ready-to-deploy vApps with pre-populated application software and operating systems. VMware Studio adds a network agent to the guest so that vApps bootstrap with minimal effort. Configuration parameters that are specified for vApps appear as OVF
properties in the vCenter Server deployment wizard. For information about VMware Studio and for download, see the VMware Studio developer page on the VMware Web site

8. Size compute resources: Memory, CPU, I/O devices, Internal storage

CPU

Plan for 60-80% utilisation, start with one vCPU and add more as required, 4-6 virtual machines per core but should be driven by the results from the discovery phase, i.e. capacity planner, perfmon, top etc…

Memory

Plan for 70-90% utilisation, take in to account virtual machine memory overhead which is determined by the amount of RAM and number of vCPUs.

I/O Devices

Minimum of four NICs on your host server. If using SCSI, use more than four NICs, note VMotion.   Assign multiple NIC’s for redundancy and increased capacity.

Taken from vSphere 5.5 performance best practices..

  • Make sure that end-to-end Fibre Channel speeds are consistent to help avoid performance problems.
  • „Configure maximum queue depth for Fibre Channel HBA cards.
  • For the best networking performance, we recommend the use of network adapters that support the following hardware features:
    • „ Checksum offload
    • „ TCP segmentation offload (TSO)
    • „ Ability to handle high-memory DMA (that is, 64-bit DMA addresses)
    • „ Ability to handle multiple Scatter Gather elements per Tx frame
    • „ Jumbo frames (JF)
    • „ Large receive offload (LRO)
  • On some 10 Gigabit Ethernet hardware network adapters, ESXi supports NetQueue, a technology that significantly improves performance of 10 Gigabit Ethernet network adapters in virtualized environments.
  • „In addition to the PCI and PCI-X bus architectures, we now have the PCI Express (PCIe) architecture. Ideally single-port 10 Gigabit Ethernet network adapters should use PCIe x8 (or higher) or PCI-X 266 and dual-port 10 Gigabit Ethernet network adapters should use PCIe x16 (or higher). There should preferably be no “bridge chip” (e.g., PCI-X to PCIe or PCIe to PCI-X) in the path to the actual Ethernet device (including any embedded bridge chip on the device itself), as these chips can reduce performance

Internal Storage

Boot device should be a minimum of 1GB,  when booting from a local disk or SAN/iSCSI LUN, a 5.2GB disk is required to allow for the creation of the VMFS volume and a 4GB scratch partition.

9. Given a constraint to use existing hardware, determine suitability of the hardware for the design.

Has to be on the VMware HCL otherwise you can run in to support issues from VMware, make sure it meets the infrastructure qualities, AMPRS!

 

VCAP DCD Study – Home Lab Design Part 6

Objective 3.3 – Create a vSphere 5.x Physical Storage Design from an Existing Logical Design

Knowledge
1. Describe selection criteria for commonly used RAID types.

The IOMEGA comes configured as RAID5 which I don’t intend to change as it gives a decent balance between performance and redundancy.

  • RAID0 = JBOD (Stripe)
  • RAID1 = mirror (data copied across both disks); can lose only 1 disk
  • RAID3 = Dedicated parity disk (min of 3 disks); can lose only 1 disk
  • RAID5 = Distributed Parity across all RAID disks; data loss potential during RAID rebuilds (min of 3 disks) ; decent Reads but Write is a division of 4 (n*IOPS/4)
  • RAID6 = Dual Parity disk distribution; n+2 during RAID rebuilds; less Reads due to 2 disks lost to Parity; Writes is less as a division of 6 (n*IOPS/6)
  • RAID1+0 = 2 disks used for striping and mirroring (min. 4 disks): best performance & most expensive; Read = sum of all disks * IOPS; Writes = ½ Read IOPS (n*IOPS/2)

A good diagram from the VMware trouble shooting storage perfomance blog

6a00d8341c328153ef0167671f883a970b-500wi

Skills and Abilities

2. Based on the service catalog and given functional requirements, for each service:

  • Determine the most appropriate storage technologies for the design.
  • Implement the service based on the required infrastructure qualities.

 I intend to use a combination of  VSAN and ISCSI storage, VSAN will come at a later date as I’m restricted by budget.  I’ll be using an IOMEGA NAS drive to present 2x 1TB datastores to the HP MicroServers, it will provide satisfactory IOPS and comes in at a good price point.

3. Create a physical storage design based on selected storage array capabilities, including but not
limited to:

  • Active/Active, Active/Passive
  • ALUA, VAAI, VASA
  • PSA (including PSPs and SATPs

Obviously I can’t apply most of this to my design but below are some things to think about if it were a real world deployment.

Multipathing policies are largely driven by the storage vendors and they should always be consulted for recommended configurations.

  • Active-active storage system Allows access to the LUNs simultaneously through all the storage ports that are available without significant performance degradation. All the paths are active at all times, unless a path fails.
  • Active-passive storage system A system in which one storage processor is actively providing access to a given LUN. The other processors act as backup for the LUN and can be actively providing access to other LUN I/O. I/O can be successfully sent only to an active port for a given LUN. If access through the active storage port fails, one of the passive storage processors can be activated by the servers accessing it.
  • Asymmetrical storage system Supports Asymmetric Logical Unit Access (ALUA). ALUA-complaint storage systems provide different levels of access per port. ALUA allows hosts to determine the states of target ports and prioritize paths. The host uses some of the active paths as primary while others as secondary

Multi-PathingPolicies:

  • Most Recently Used (MRU) — Selects the first working path, discovered at system boot time. If this path becomes unavailable, the ESX/ESXi host switches to an alternative path and continues to use the new path while it is available. This is the default policy for Logical Unit Numbers (LUNs) presented from an Active/Passive array. ESX/ESXi does not return to the previous path when if, or when, it returns; it remains on the working path until it, for any reason, fails.

Note: The preferred flag, while sometimes visible, is not applicable to the MRU pathing policy and can be disregarded.

  • Fixed (Fixed) — Uses the designated preferred path flag, if it has been configured. Otherwise, it uses the first working path discovered at system boot time. If the ESX/ESXi host cannot use the preferred path or it becomes unavailable, ESX/ESXi selects an alternative available path. The host automatically returns to the previously-defined preferred path as soon as it becomes available again. This is the default policy for LUNs presented from an Active/Active storage array.
  • Round Robin (RR) — Uses an automatic path selection rotating through all available paths, enabling the distribution of load across the configured paths. For Active/Passive storage arrays, only the paths to the active controller will used in the Round Robin policy. For Active/Active storage arrays, all paths will used in the Round Robin policy.

Note: This policy is not currently supported for Logical Units that are part of a Microsoft Cluster Service (MSCS) virtual machine.

  • Fixed path with Array Preference — The VMW_PSP_FIXED_AP policy was introduced in ESX/ESXi 4.1. It works for both Active/Active and Active/Passive storage arrays that support ALUA. This policy queries the storage array for the preferred path based on the arrays preference. If no preferred path is specified by the user, the storage array selects the preferred path based on specific criteria.

Note: The VMW_PSP_FIXED_AP policy has been removed from ESXi 5.0. For ALUA arrays in ESXi 5.0 the PSP MRU is normally selected but some storage arrays need to use Fixed.

VAAI:

  • Full copy, also called clone blocks or copy offload. Enables the storage arrays to make full copies of data within the array without having the host read and write the data. This operation reduces the time and network load when cloning virtual machines, provisioning from a template, or migrating with vMotion.
  • Block zeroing, also called write same. Enables storage arrays to zero out a large number of blocks to provide newly allocated storage, free of previously written data. This operation reduces the time and network load when creating virtual machines and formatting virtual disks.
  • Hardware assisted locking, also called atomic test and set (ATS). Supports discrete virtual machine locking without use of SCSI reservations. This operation allows disk locking per sector, instead of the entire LUN as with SCSI reservations.
  • Array thin provision, help to monitor space use on thin-provisioned storage arrays to prevent out-of-space conditions, and to perform space reclamation, space reclamation is a manual process and needs to be run for the ESXi CLI.

VASA

Storage systems that use vStorage APIs for Storage Awareness, also called VASA, are represented by storage providers. Storage providers inform vCenter Server about specific storage devices, and present characteristics of the devices and datastores deployed on the devices as storage capabilities. Such storage capabilities are system-defined and vendor specific.
A storage system can advertise multiple capabilities. The capabilities are grouped into one or more capability profile. Capabilities outline the quality of service that the storage system can deliver. They guarantee that the storage system can provide a specific set of characteristics for capacity, performance, availability, redundancy, and so on.
Vendor specific capabilities appear in the Storage Policy-Based Management system. When you create a storage policy for your virtual machine, you reference these vendor specific storage capabilities, so that your virtual machine is placed on the datastore with these capabilities.

PSA – collection of APIs to allow 3rd party ISVs to design their own load balance/failover techniques

PSPs – I/O path selection; MRU (default for A/P), Fixed (default for A/A), RR (either)

Storage Array Type Plug-Ins (SATPs) run in conjunction with the VMware NMP and are responsible for
array-specific operations.

ESXi offers a SATP for every type of array that VMware supports. It also provides default SATPs that support non-specific active-active and ALUA storage arrays, and the local SATP for direct-attached devices.
Each SATP accommodates special characteristics of a certain class of storage arrays and can perform the array-specific operations required to detect path state and to activate an inactive path. As a result, the NMP module itself can work with multiple storage arrays without having to be aware of the storage device specifics.

After the NMP determines which SATP to use for a specific storage device and associates the SATP with the physical paths for that storage device, the SATP implements the tasks that include the following:

Monitors the health of each physical path.

Reports changes in the state of each physical path.

Performs array-specific actions necessary for storage fail-over. For example, for active-passive devices, it can activate passive paths.

 

4. Identify proper combination of media and port criteria for given end-to-end performance requirements.

Refers to tiered storage based on performance type, e.g.

Gold = SSD

Silver = FC 15k SAS

Bronze = 7K Sata

 

5. Specify the type of zoning that conforms to best practices and documentation.

With ESXi hosts, use a single-initiator zoning or a single-initiator-single-target zoning. The latter is a preferred zoning practice. Using the more restrictive zoning prevents problems and misconfigurations that can occur on the SAN.

Zoning not only prevents a host from unauthorized access of storage assets, but it also stops undesired host-to-host communication and fabric-wide Registered State Change Notification (RSCN) disruptions. RSCNs are managed by the fabric Name Server and notify end devices of events in the fabric, such as a storage node or a switch going offline. Brocade isolates these notifications to only the zones that require the update, so nodes that are unaffected by the fabric change do not receive the RSCN. This is important for non-disruptive fabric operations, because RSCNs have the potential to disrupt storage traffic.

There are two types of Zoning identification: port World Wide Name (pWWN) and Domain,Port (D,P). You can assign aliases to both pWWN and D,P identifiers for easier management. The pWWN, the D,P, or a combination of both can be used in a zone configuration or even in a single zone. pWWN identification uses a globally unique identifier built into storage and host interfaces. Interfaces also have node World Wide Names (nWWNs). As their names imply, pWWN refers to the port on the device, while nWWN refers to the overall device. For example, a dual-port HBA has one nWWN and two pWWNs. Always use pWWN identification instead of nWWN, since a pWWN precisely identifies the host or storage that needs to be zoned.

6. Based on service level requirements utilize VMware technologies, including but not limited to:

Storage I/O Control
Storage Policies
Storage vMotion
Storage DRS

Storage I/O Resource Allocation

VMware vSphere provides mechanisms to dynamically allocate storage I/O resources, allowing critical workloads to maintain their performance even during peak load periods when there is contention for I/O resources. This allocation can be performed at the level of the individual host or for an entire datastore. Both methods are described below.
„
The storage I/O resources available to an ESXi host can be proportionally allocated to the virtual machines running on that host by using the vSphere Client to set disk shares for the virtual machines (select edit, virtual machine settings, choose the Resources tab, select Disk, then change the Shares field).
„
The maximum storage I/O resources available to each vi rtual machine can be set using limits. These limits, set in I/O operations per second (IOPS), can be used to provide strict isolation and control on certain workloads. By default, these are set to unlimited. When set to any other value, ESXi enforces the limits even if the underlying datastores are not fully utilized.
„
An entire datastore’s I/O resources can be proportiona lly allocated to the virtual machines accessing that datastore using Storage I/O Control (SIOC). When enabled, SIOC evaluates the disk share values set for all virtual machines accessing a datastore and allocates that datastore’s resources accordingly. SIOC can be enabled using the vSphere Client (select a datastore, choose the Configuration tab, click Properties…(at the far right), then under Storage I/O Control add a checkmark to the Enabled box).

With SIOC disabled (the default), all hosts accessing a datastore get an equal portion of that datastore’s resources. Share values determine only how each host’s portion is divided amongst its virtual machines.

Storage Policies
Formerly called virtual machine storage profiles, to ensure that virtual machines are placed to storage that guarantees a specific level of capacity, performance, availability, redundancy, and so on. When you define a storage policy,  you specify storage requirements for applications that would run on virtual machines. After you apply this storage  policy to a virtual machine, the virtual machine is placed to a specific datastore that can satisfy the storage requirements

Storage vMotion
used for no downtime for datastore maintenance; transitioning to new
Array; datastore load balancing (SDRS)

Storage DRS
A feature that provides I/O load balancing across datastores within a datastore cluster.
This load balancing can avoid storage performance bottlenecks or address them if they occur

7. Determine use case for virtual storage appliances, including the vSphere Storage Appliance.

 VSA provides High Availability and automation capabilities of vSphere to any small environment without shared storage hardware. Get business continuity for all your applications, eliminate planned downtime due to server maintenance, and use policies to prioritize resources for your most important applications. VSA enables you to do all this, without shared storage hardware.

Don’t understand why VSAN isn’t mentioned in the Blueprint??

8. Given the functional requirements, size the storage for capacity, availability and performance,
including:

Virtual Storage (Datastores, RDMs, Virtual Disks)
Physical Storage (LUNs, Storage Tiering)

Some of the info below was borrowed from the Brownbag VCAP DCD Study notes PDF.

Take I/O metrics of guests (VDI) and server workloads. Take into acct disk type & write
penalty for RAID Type

  • Capacity – consider overhead for snapshots, vswp, and logging
  • Availability – multiple HBAs, multipathing, multiple switches
  • Performance – enable read/write cache on SAN; enable CRBC in VDI; *NOTE: disable write
    cache if not battery backed*
  • Datastores – segregate high I/O traffic on different DSs
  • RDMs – needed for SAN based replication & tasks; required for MSCS
  • Virtual Disks – recommended; better provisioning capability over RDM; more portable;
    functional with all vSphere features
  • LUNs -ONE VMFS (DS) per LUN; can have multiple on a target or 1 per target
    Storage Tiering – based on app SLAs (SSD vs SAS vs SATA); thin provisioning

How Large a LUN?
The best way to configure a LUN for a given VMFS volume is to size for throughput first and capacity second.
That is, you should aggregate the total I/O throughput for all applications or virtual machines that might run on a given shared pool of storage; then make sure you have provisioned enough back-end disk spindles (disk array cache) and appropriate storage service to meet the requirements.
This is actually no different from what most system administrators do in a physical environment. It just requires an extra step, to consider when to consolidate a number of workloads onto a single vSphere host or onto a collection of vSphere hosts that are addressing a shared pool of storage.

Each storage vendor likely has its own recommendation for the size of a provisioned LUN, so it is best to check with the vendor. However, if the vendor’s stated optimal LUN capacity is backed with a single disk that has little or no storage array write cache, the configuration might result in low performance in a virtual environment. In this case, a better solution might be a smaller LUN striped within the storage array across many physical disks, with some write cache in the array. The RAID protection level also factors into the I/O throughput performance.
Because there is no single correct answer to the question of how large your LUNs should be for a VMFS volume, the more important question to ask is, “How long would it take one to restore the virtual machines on this datastore if it were to fail?”
The recovery time objective (RTO) is now the major consideration when deciding how large to make a VMFS datastore. This equates to how long it would take an administrator to restore all of the virtual machines residing on a single VMFS volume if there were a failure that caused data loss. With the advent of very powerful storage arrays, including Flash storage arrays, the storage performance has become less of a concern. The main concern now is how long it would take to recover from a catastrophic storage failure.

Another important question to ask is, “How does one determine whether a certain datastore is overprovisioned or underprovisioned?”
There are many performance screens and metrics that can be investigated within vCenter to monitor datastore I/O rates and latency. Monitoring these metrics is the best way to determine whether a LUN is properly sized and loaded. Because workload can vary over time, periodic tracking is an important consideration. vSphere Storage DRS, introduced in vSphere 5.0, can also be a useful feature to leverage for load balancing virtual machines across multiple datastores, from both a capacity and a performance perspective.

9. Based on the logical design, select and incorporate an appropriate storage network into the physical design:
iSCSI
NFS
FC
FCoE

  • Plan for failures
  • Connect the host and storage ports in such a way as to prevent a single point of failure from affecting redundant paths. For example, if you have a dual-attached host and each HBA accesses its storage through a different storage port, do not place both storage ports for the same server on the same Line Card or ASIC.
  • Use two power sources.
  • For host and storage layout To reduce the possibility of congestion, and maximize ease of management, connect hosts and storage port pairs to the same switch where possible.
  • Use single initiator zoning For Open Systems environments, ideally each initiator will be in a zone with a single target. However, due to the significant management overhead that this can impose, single initiator zones can contain multiple target ports but should never contain more than 16 target ports

san

 

 

VCAP DCD Study – Home Lab Design Part 5

Section 3 – Create a vSphere Physical Design from an Existing Logical Design

Objective 3.1 – Transition from a Logical Design to a vSphere 5.x Physical Design

Skills and Abilities

1. Determine and explain design decisions and options selected from the logical design.

The main drivers behind my design decisions were cost and the space requirements, this is why I opted for the HP Micro Servers and the TP-Link Smart switch, they are relatively inexpensive, energy efficient and will take up very little space.

2. Build functional requirements into the physical design.

One of my functional requirements is that VLAN tagging needs to be available, I opted for the Asus RT-AC68U wireless router, not only because it’s an awesome piece of kit but I intend to flash it with Tomato or DD-WRT (research ongoing)  which should allow me to enable VLAN tagging.

3. Given a logical design, create a physical design taking into account requirements, assumptions and constraints.

Nothing to add here.

4. Given the operational structure of an organization, identify the appropriate management tools and roles for each staff member.

Management tools were covered in an earlier objective e.g. VMA, Web Client, PowerCLI etc….

Below are predefined roles but new roles can be created to satisfy security requirements.No Access

  • Read Only
  • Administrator
  • Virtual Machine Power User
  • Virtual Machine User
  • Resource Pool Administrator
  • Datastore Consumer
  • Network Consumer

Objective 3.2 – Create a vSphere 5.x Physical Network Design from an Existing Logical Design

1. Describe VLAN options, including Private VLANs, with respect to virtual and physical switches.

I’ve borrowed some of the material below from the excellent BrownBag VCAP DCD Study outline as they’ve already done a great job of covering the major points.

  • VLANs – feature of both vSS & vDS; 3 types = EST, VST (default), VGT
  • PVLANs – vDS capability (virtual); Primary = Promiscuous, Secondary =Community, Isolated
  • pSwitches = need Trunk port(s) configured; if possible, enable LinkState and disable Native VLAN mode

2. Describe switch-specific settings for ESXi-facing ports, including but not limited to:

  • STP
  • Jumbo Frames
  • Load-balancing
  • Trunking
  • STP – disable on physical switch
  • Jumbo Frames – enable end to end on storage/network path.
  • Load- Balancing NIC Teaming; Route based on Originating Virtual Port ID
  • Trunking – VLAN Tagging;enable on physical switch when using VLAN

3. Describe network redundancy considerations at each individual component level.

  • Management network – utilize active/standby vmnics (pNICs)
  • 2 vSwitches & 2 Mgmt Netwks (1 on ea vSwitch) OR, 1 Mgmt Netwk with 2 pNICs
  • Dual physical switches
  • Multiple pNICs within hosts
  • Multipathing for storage (HBAs)

4. Cite virtual switch security policies and settings

Settings

  • Failback = yes; mitigates false positive of a phys switch being active when it’s still down
  • Notify Switches = yes
  • VM Network traffic – configure pNICs for Port Group in Active/Active

Security Policies:

  • IP Storage: segregate from VM traffic using VLANs; NFS export (/etc/export); iSCSI CHAP
  • MAC Address Change – REJECT; if using iSCSI set to ACCEPT
  • Forged Transmits – REJECT; prevents MAC impersonationPromiscuous Mode – REJECT
  • IPSec – authentication & encryption on packets
  • Disable native VLAN use on pSwitches to prevent VLAN hopping

Skills and Abilities

5. Based on the service catalog and given functional requirements, for each service:

  • Determine the most appropriate networking technologies for the design.
  • Implement the service based on the required infrastructure qualities (AMPRS).
  • vSS vs vDS – small or relatively large infrastructure? – In my case I will be using a hybrid solution of vSS for management, vMotion and FT and the vDS for VM and NFS traffic.
  • VLANs or not -meet compliance or SLAs by segregating traffic, I will be using VLANs.
  • .IP Storage? – jumbo frames configured, I intend to use NFS based storage but will not be enabling jumbo frames as my switch does not support it.
  • M y networking will be 1GbE

switch1

6. Determine and explain the selected network teaming and failover solution.

  • Default Team = Route based on originating virtual port ID ; also, originating MAC; IP Hash, I’ll be using route based on originating virtual port ID
  • Default Failover = Use Explicit Failover

vswitch0

vswitchport

dvs

 

7. Implement logical Trust Zones using network security/firewall technologies.

This was covered in the security section.

8. Based on service level requirements, determine appropriate network performance characteristics.

Taken from VMware vDS best practices.

2012-08-type-of-net-traffic

 

2012-08-nioc_example

9. Given a current network configuration as well as technical requirements and constraints,  determine the appropriate virtual switch solution:

  • vSphere Standard Switch
  • vSphere Distributed Switch
  • Third-party solutions (ex. Nexus 1000V)
  • Hybrid solution
  • vSS -used for smaller environments
  • vDS -easier mgmt/administration; centralized; larger environments; req’s Ent+
  • 3rd Party (Nexus 1000v) -considerations needed on what is supported (i.e. vShield, iSCSI, Host Profiles, AppSpeed, vDR, Multipathing – no support DPM, -no support, SRM
  • Hybrid – used so connectivity can be continued if vCener goes down (needed for vDS); when mixing ESX (w/Serv Cons) & ESXi (Mgmt Netwks)
  • Cisco PDF listing feature comparison of vSS, vDS, &Nexus http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9902/solution_overview_c22-526262.pdf; based on business requirements, you can compare the switches and determine which is best (budget may be a constraint for purchase of 3rd party switch as well as vSphere Edition needed

10. Based on an existing logical design, determine appropriate host networking resources.

Based on requirements, budget, constraints etc etc…

11. Properly apply converged networking considering VMware best practices.

  •  Using 10GbE cards and consolidating traffic on 1 card, using 2nd for redundancy
  • Recommended (if licensed for it) to use NIOC (on vDS) for QoS on traffic type