VCAP DCD Study – Home Lab Design Part 3

Objective 2.3 – Build Availability Requirements into the Logical Design

1. Understand what logical availability services are provided by VMware solutions.

I’ll be utilising VMware HA and possibly Fault Tolerance in the design.

2. Identify and differentiate infrastructure qualities (Availability, Manageability, Performance, Recoverability, Security)

  • Availability is the ability of a system or service to perform its required function when required. It is usually calculated as a percentage like 99,9%.
  • Manageability describes the expense of running the system. If you have a huge platform that is managed by a tiny team the operational costs are very low.
  • Performance is the measure of what is delivered by a system. This accomplishment is usually measured against known standards of speed completeness and speed.
  • Recoverability describes the ability to return a system or service to a working state. This is usually required after a system failure and repair.
  • Security is the process of ensuring that services are used in an appropriate way.

3. Describe the concept of redundancy and the risks associated with single points of failure.

There will be some redundancy built in to my design but not at the level the exam blueprint requires, for the purpose of study below is some info from the link provided in the blue print.

Design Principles for High Availability

The key to architecting a highly available computing environment is to eliminate single points of failure. With the potential of occurring anywhere in the environment, failures can affect both hardware and software. Building redundancy at vulnerable points helps reduce or eliminate downtime caused by [implied] hardware failures. These include redundancies at the following layers:

  • Server components such as network adaptors and host bus adaptors (HBAs)
  • Servers, including blades and blade chassis
  • Networking components
  • Storage arrays and storage networking

4. Differentiate Business Continuity and Disaster Recovery concepts.

Business continuity is a proactive action focused on avoiding or mitigating the impacts of risks before they happen.

  • The business must continue to operate for weeks, months and years
  • Who, What, Where and When is needed
  • Not just technical, whole of business
  • Very Strategic

Disaster recovery is focused on how to return services after an outage or failure has occurred which is a reactive action.

  • We hoped it would never happen but it has
  • Get the business running again ASAP
  • Tactical, Technical

Skills and Abilities

5. Determine availability component of service level agreements (SLAs) and service leve management processes.

Define an SLA for each and design a setup that will accommodate. For example if your SLA for a certain VM failure is 0, then configure that VM for FT. Or if your SLA is a couple minutes then VMware HA should be good enough. If there are other services that you commit to (i.e. performance) then create storage tiers as necessary

6. Explain availability solutions for a logical design based on customer requirements.

As mentioned I won’t be designing  a DR solution, but below is an example of a logical design of an availability solution using SRM.

dr diag

7. Define an availability plan, including maintenance processes.

This was taken from the link provided in the blue print.

VMware vSphere makes it possible to reduce both planned and unplanned downtime without the cost and complexity of alternative solutions. Organizations using VMware can slash planned downtime by eliminating most scheduled downtime for hardware maintenance. VMware VMotion™ technology, VMware Distributed Resource Scheduler (DRS) maintenance mode, and VMware Storage VMotion™ make it possible to move running workloads from one physical server to another without downtime or service interruption, enabling zero-downtime hardware maintenance.

Depending on what type of failure you are defining a plan for, do it properly. For SRM create an appropriate Run book. This will be used during a site failure. For host upgrades, make a plan to vMotion all the VMs and ensure there are available resources for all the VMs with one host down, then update the host. For VM maintenance take a snapshot and then revert back if the VM upgrade didn’t go well.

8. Prioritize each service in the Service Catalog according to availability requirements.

Using VMware HA set the reboot priority depending on the availability requirements. Most important Services/VMs can have the highest priority during an HA failover

VM Restart Priority Setting

VM restart priority determines the relative order in which virtual machines are restarted after a host failure.Such virtual machines are restarted sequentially on new hosts, with the highest priority virtual machines first and continuing to those with lower priority until all virtual machines are restarted or no more cluster resources are available.

9. Balance availability requirements with other infrastructure qualities

VMware also helps protect against unplanned downtime from common failures, including:

Network and storage interface failures. Support for redundant network and storage interfaces is built into VMware ESX™. Redundant network and storage interface cards can be shared by multiple virtual machines on a server, reducing the cost of implementing redundancy. VMware virtualization also makes it easy to create redundant servers without additional hardware purchases by allowing for the provisioning of virtual machines to existing underutilized servers.

Server failures. VMware High Availability (HA) and VMware Fault Tolerance deliver protection against server failures without the cost and complexity often associated with implementing and maintaining traditional solutions. VMware HA automatically restarts virtual machines affected by server failures on other servers to reduce downtime from such failures to minutes, while VMware Fault Tolerance ensures continuous availability for virtual machines by using VMware vLockstep technology to create a live shadow instance of a virtual machine on another server and allow instantaneous, stateful failover between the two instances.

Overloaded servers. VMware VMotion, VMware Distributed Resource Scheduler (DRS), and VMware Storage VMotion help you to proactively balance workloads across a pool of servers and storage.

Objective 2.4 – Build Manageability Requirements into the Logical Design


1. Understand what management services are provided by VMware solutions.

Not an exhaustive list and I will only be using a few of them in my lab.

vMA, vCenter, PowerCLI, vCLI, vCenter Orchestrator, vSphere API, vSphere HA, vSphere DRS, Auto Deploy, Scheduled Tasks, Host Profiles.

2. Identify and differentiate infrastructure qualities (Availability, Manageability, Performance, Recoverability, Security)

Already covered this in objective 2.3

Skills and Abilities

3. Build interfaces to existing operations practices into the logical design

This is talking about integrating existing services such as an existing Database or Active Directory in to the logical desgin, obviously I can’t apply this to my design.

4. Address identified operational readiness deficiencies

Again I can’t apply this to my design but it’s referring to issues that we’re picked up during the discovery phase that need to be fixed as part of the new design.

5. Define Event, Incident and Problem Management practices

ITIL Definitions

  • Event – A Change of state which might have an influence for the management of a service or system
  • Incident – An event which is not part of standard operation and usually causes a service disruption to degrade functionality
  • Problem – The cause of one or more incidents

6. Define Release Management practices

ITIL Definition

Release Management encompasses the planning, design, build, configuration and testing of hardware and software releases to create a defined set of release components.

The goal of the Release and Deployment Management process is to assemble and position all aspects of services into production and establish effective use of new or changed services.
Effective release and deployment delivers significant business value by delivering changes at optimized speed, risk and cost, and offering a consistent, appropriate and auditable implementation of usable and useful business services.
Release and Deployment Management covers the whole assembly and implementation of new/changed services for operational use, from release planning through to early life support

7. Determine Request Fulfillment processes

More stuff from ITIL

Each catalog item uses a fulfillment process, to define the request fulfillment process when that item is ordered.

Fulfillment processes are used when ordering standard catalog items, but are not used for some extended types of catalog item, such as content items.

8. Design Service Asset and Configuration Management (CMDB) systems

  • SACM supports the business by providing accurate information and control across all assets and relationships that make up an organization’s infrastructure.
  • The purpose of SACM is to identify, control and account for service assets and configuration items (CI), protecting and ensuring their integrity across the service lifecycle.
  • The scope of SACM also extends to non-IT assets and to internal and external service providers, where shared assets need to be controlled.
  • To manage large and complex IT services and infrastructures, SACM requiresthe use of a supporting system known as the Configuration Management System (CMS)

9. Define Change Management processes

Change management is an IT service management discipline. The objective of change management in this context is to ensure that standardized methods and procedures are used for efficient and prompt handling of all changes to control IT infrastructure, in order to minimize the number and impact of any related incidents upon service. Changes in the IT infrastructure may arise reactively in response to problems or externally imposed requirements, e.g. legislative changes, or proactively from seeking improved efficiency and effectiveness or to enable or reflect business initiatives, or from programs, projects or service improvement initiatives. Change Management can ensure standardized methods, processes and procedures which are used for all changes, facilitate efficient and prompt handling of all changes, and maintain the proper balance between the need for change and the potential detrimental impact of changes.

10. Based on customer requirements, identify required reporting assets and processes

I’m not entirely sure what this is referring to, need to do some more research!!!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s