Terraform Azure Provider – Deploy a Virtual Machine and Join Domain

I’ve spent quite a lot of time working with Terraform over the past few months both at work and at home, so far I’ve been concentrating on the vSphere provider, I thought it was time to take a look at the Azure provider.  The beauty of Terraform is that it’s vendor agnostic, so once you understand how the logic works, in theory it should be a relatively painless transition to working with a different provider.   I’ve been using Azure for the past year or so, mostly for personal projects and for experimentation, during my study for the Azure Certified Administrator I worked with Arm templates and found them to be quite cumbersome,  they require an awful lot of code just to deploy a simple virtual machine.  Take a look here for an example of an Arm template to deploy a VM, then compare it to my completed test project on GitHub.

My project goes a step further also joins the machine to the domain, it will deploy the following…

  • Create Resource Group
  • Create VNET
  • Create Subnet
  • Create Public IP Address
  • Create Network Security Group and open RDP firewall port
  • Create NIC
  • Create Virtual Machine
  • Join virtual machine to custom domain (optional)

 

If you want to quickly deploy some infrastructure to Azure.

  • Create a clone of my repository to your local workstation.
  • Run ‘terraform init’ to download the Azure provider plugin
  • Edit main.tf with your subscription ID

  • run ‘terraform plan’
  • run ‘terraform apply’ once you are happy with the plan

This was my first attempt at deploying infrastructure into Azure using Terraform, I had everything up and running in a couple of hours, I accept it’s not the most complex deployment, but it was a useful starting point and one from which I intend to expand upon in the very near future.

My next challenge is to deploy some infrastructure in AWS using Terraform, hopefully a blog post will follow soon.

I’m still working away on #100DaysofCode(Python), I do need to check how many days I’ve got left as I’ve lost track! I must be fast approaching 100 days.  I’ll put up another post once I’ve caught up with things, thanks for reading!

 

Terraform vSphere Provider – Deploy multiple virtual machines of varied specifications – Part 2

Reminder of Objectives

  • Deploy multiple virtual machines of varying RAM, CPU and Disk Sizes.
  • Create 2 Virtual Machine Folders.
  • Create a Virtual Distributed Switch.
  • Attach the new virtual machines to the new VDS.
  • Place the new Virtual Machines in the correct folder

 

In Part 1  I went through some folder structure and suggestions for organising the terraform files, now it’s time to start populating the files, i.e. terraform.tfvars, variables.tf and main.tf.   I found when I started working on this mini-project that not many people have blogged about using the Terraform vSphere provider, the documentation provided by HashiCorp is quite detailed and they also provide a number of use case examples so it was a useful jumping off point, having said that… there was whole lot of trial and error involved to get the solution working.  I’ve uploaded the finalised project to GitHub which can be found below.

Final GitHub Project

The main challenge I had was creating multiple virtual machines of differing sizes,  the solution was to define a ‘list’ variable for each virtual machine type, in the example on GitHub I’ve split them up in the ‘Web’ and ‘App’ virtual machine types, below is an example from the variables.tf files, here I declare the variable for machine types of ‘Web’ and ‘App’

Once the variables have been declared we then need to populate terraform.tfvars with the virtual machines specifications such as name, cpu, ram etc…

Main.tf code blocks

There are 3 code block types in the main.tf file, ‘provider’, ‘data’, and ‘resource’

  • provider block – This denotes the type of terraform provider to be used for the project, in this case it’s vSphere.
  • data blocks – represent infrastructure that you want to deploy on to, for example the VCSA, Datacenter, ESXi hosts etc…..
  • resource blocks – represent new infrastructure, so what you intend to deploy e.g. VDS, virtual machines, folders etc..

The next step is to populate the main.tf file,  we will add the virtual machine as a ‘resource’ block, as you can see Terraform will perform a ‘count’ operation on the list we populated in the terrafrom.tfvars file, it’s basically a ‘for’ loop, so for each indexed item in the list it will apply the values for CPU,RAM,  etc…. to the indexed item, you can have as many virtual machines here as you want, for the purpose of demonstration I’ve included 2 virtual machines per virtual machine type.  There’s lots of other stuff going on here such adding a static Mac-Address, adding the machine to the new port group on the VDS and adding multiple disks,  I’ve made use of ‘lists’ in a number of places, if you look at the completed code on GitHub most of it should be quite self explanatory.  If anything is unclear feel free to post a message and I’ll get back to you asap.

 

Terraform vSphere Provider – Deploy multiple virtual machines of varied specifications – Part 1

I recently had the opportunity to work with Terraform which is HashiCorp’s Infrastructure as Code (IaC) offering, it’s an interesting tool as it’s vendor agnostic and can be used with AWS, VMware, Azure and many more software vendors.  Organisations seem to be heading towards a mix of multi-cloud and on-premise Infrastructure which means more tools and more skills to learn for the people supporting it, Terraform seems like it could be a really good choice to help cut down on complexity and staff training/re-training.

The organisation I’m currently working for tasked me with developing a proof of concept so they can determine if the tool is a good fit for their business, they are primarily a VMware house with an increasing footprint in Microsoft Azure.  Terraform has the concept of  ‘Providers’, which basically contain the underlying code that performs all the heavy lifting and interaction with the vendor’s software, for example for VMware there is a vSphere provider, for Amazon Web Services there’s the AWS provider, for the purpose of this post I will be using the vSphere provider.

Objectives:

  • Deploy multiple virtual machines of varying RAM, CPU and Disk Sizes.
  • Create 2 Virtual Machine Folders.
  • Create a Virtual Distributed Switch.
  • Attach the new virtual machines to the new VDS.
  • Place the new Virtual Machines in the correct folder
  • Standardise the code on GitHub so it can be cloned and used on cross region.

Getting Started:

I’m going to assume that you already understand how get Terraform up and running on your workstation.  Once you’ve created a  folder and downloaded the relevant Terraform binaries we can start organising the folder structure.  I want to make sure that other admins can clone the git repo and modify the variables that are relevant to the site they are deploying, I would like to keep the editing of files to a minimum for other admins.

I prefer the structure below as it means that when somebody else  wants to re-use your code they only need to edit the terraform.tfvars file.

  • main.tf – This is where you define the provider and describe what you intend to deploy.
  • variables.tf – For every variable in main.tf there needs to be a corresponding variable in variables.tf
  • terraform.tfvars – This is the only file that needs to be edited once your code has been finalised.

 

GItHub:

I’ve decided to include a few steps on GitHub set up.

You can set up and account for free on GitHub, once you’re signed in create a new repo, to clone the repo you will need to install git tools to run the git commands, I’m on a Windows machine so I’ve used ‘Git for Windows’.

  • Clone the repo to your workstation

  • Initilaise the repo

Now that the repo is initialised you can create a new branch and start pushing changes to the main repo, below is a quick summary of commands to create a new branch.

  • git add .
  • git commit -m “Initial Commit”
  • git branch Initial_Branch
  • git push origin Initial_Branch

In part 2 of this series we’ll start editing the files…..

 

 

Python – #100DaysOfCode

I’ve been using PowerShell/PowerCli for many years and have written scripts ranging from a few lines to a few thousand lines, I’ve been guilty in the past of using snippets of code to get a job done without fully understanding what the code is doing at a low level, I didn’t study computer science and have had no formal training in programming.

I’ve seen quite a few people tweeting about #100DaysOfCode and after a bit of googling it seems the basic premise is that you code for at least 1 hour every day and then tweet your progress using the #100DaysOfCode hashtag, you can read more about it here.  This sounded interesting to me as it added a bit of structure to help me achieve my goals, I also came across this course from TalkPython which is split in to days so you can track your progress, for example…day 1 watch an intro video, day 2 do some practice exercises etc etc..this sounded exactly what I was looking for so I bit bullet and paid for the course.

It started off okay, there was some background on the course then some python fundamentals and playing with date/times, then it took a supersonic leap in to default dictionaries, named tuples, deques!! no gradual progression on this course! the presenter spoke very quickly and assumed you understood what he was talking about when he was setting up his virtual environment, I tried my best to stick with it but after day 9 I was lost and frustrated.  I sent an email of complaint to the course provider stating that they should really make it clear that this course is not for beginners, to be fair to them they sent me voucher for some other course but I decided to change strategy as this course was demotivating me almost to the point of wanting to throw in the towel.  I might yet go back to the course once I’ve got a stronger grounding in Python but for now this is not the path for me to take.

I got back to google and came across this course on Udemy ‘The Complete Python Bootcamp’ it had lots of positive reviews and seemed to be much better suited to my current requirements, so far I’ve not been disappointed! This course is great! it does exactly what it says on the tin, it goes right back to basics, I had to sit through lectures on variables, operators, booleans etc.. stuff that’s not unfamiliar to me, but it didn’t matter, this is what I wanted, to strip everything back and start at the beginning.

I’m on day 40 of the challenge and I’m just about to start the first milestone project which is to create a noughts and crosses game for 2 players on the same computer, sounds quite simple but I’m pretty sure it won’t be!

I’ve missed a few days due to a holiday in August but other than that I’ve been coding every day for more than an hour, I’ve been using my commute to and from work to study/practice which has been working well as there a fewer distractions.  I’m hoping to update the blog a bit more with my progress in the coming weeks, months.

Azure Administrator Associate – PASS! (and had a bit of a result!)

So I sat the AZ-100 exam after 3 months of study, I booked the exam in a test centre not too far from my office in central London, usual conditions applied, photo ID, they take a photograph of you etc.. etc…

I was feeling well prepared for the exam as I’d spent lots of time running labs and doing lot of practice tests using ‘measureup’.   The exam was REALLY SLOW to load up and the response was very sluggish, I don’t know if it was the ancient desktop PC’s in the test centre or if their internet connection was going over a modem but it was really bad.

I carried on as best I could and was feeling the confident with the answers I’d given to this point, then it was time for the first lab! Well, as my father would say….. ‘Genie Mac!’ it was virtually unusable, it brings you to the Azure portal, you log in using a designated account and password, this alone took over 5 minutes!! Then you had a series of tasks to complete, without breaching NDA, make sure you know you’re networking and storage config really well.  I knew it quite well and was trying my best to complete the labs but it was so damn slow, I could see the clock running down so had to make a call to abandon the lab and move on,  now I was really worried that I was going to run out of time.

I closed the lab and answered some more multi-choice questions, they were relatively straight forward, very wordy but nothing too unexpected, then, another lab!!!!   Same as the previous lab, slow to the point of being unusable, I finished one task and had to move on, at this point I was almost accepting defeat as I thought there was no way I would pass as I had missed so many of the lab tasks.  I carried on regardless and made a decision at that point to appeal to Microsoft after the exam on the basis it was unfair due to the woeful performance of the exam software and or test centre.

I reached the end of the exam and hit submit, held my breath and expected the worst…..eventually the result appeared on the screen, somehow or other I had managed to pass!  I can only guess that Microsoft are aware of the poor performance of the labs and take this in to account when scoring the test???  Anyway, I was very relieved to pass as it was stressful experience, I had booked the AZ-101 for the following week, I was dreading the thought of going through all that again.

 

And then the IT Certification Gods smiled upon me!

In the week leading up the AZ-101 exam, Microsoft announced that they will be consolidating the exam in to a single test, the AZ-103! Anyone that had already taken and passed the AZ-100 would automatically be awarded the Azure Administrator Associate certificate!!! Obviously I cancelled the AZ-101 exam as soon as I found out, what a relief that I didn’t have to go through that terrible exam experience all over again.

I’ve decided not to pursue any further IT Certs for now, my next challenge is to improve my coding skills and focus my attentions on PYTHON! More posts to follow!

Microsoft Certified: Azure Administrator Associate

It’s been a while since I passed the AWS exams so I thought it was high time I got back to the books. The organisation I’m currently working for have recently started migrating some workloads in to Microsoft Azure, so this seemed like the obvious place for me to focus my attentions.

At the time of writing the Azure Administrator Associate cert comprises of two exams, the AZ-100 and AZ-101, I’ve heard that this may change in the near future! so after a bit of research I decided to purchase the AZ-101 course by Nick Coyler via Udemy for just £9.99!

UPDATE! The certificate is now only one exam the AZ-103, blog post to follow!!!

I also created a free (12 Months) Microsoft Azure Account so I could run through the labs and of course do lots of practice, I also decided to pay for the official Microsoft practice tests from ‘measureup’ as past experience has taught me that knowing the format of the exam and how a vendor phrases their questions can save valuable time on the day of the exam, it’s an investment in my career so it’s money well spent as far as I’m concerned.

So armed with all of this I set myself a target of 3 months to study and pass the exam, I would advise booking the exam as far in advance as possible as I found that the London test centres are very busy and often did not have slots that suited my timetable, you do have the option of taking the exam from your own home under very strict exam conditions, I decided not to go down this route as I’ve got 2 lovely, but very noisy children!  I’ll create a new post once I’ve sat the exam to share my experiences, pass or fail!

 

AWS Certified SysOps Administrator Associate – PASS!

Exactly two weeks after sitting the AWS CSA Associate exam I attempted  and passed the AWS Sysops Administrator Associate exam with a score of 72%, there was quite a lot of crossover from the Architect exam with a big focus on Cloudwatch,VPC/Networking and IAM, you really need to know these subjects in quite a bit of depth .  Once again the questions were quite wordy and really made me think hard about my answers, for preparation I used the acloud guru course which this time around did a good job of covering all the subjects in the exam, unlike the architect associate exam nothing unexpected came up and I felt well prepared for most of the subject material.  I also consulted the AWS faqs quite a lot and of course plenty of hands on work, if you have already achieved the AWS CSA-Associate cert I would say there is not huge amount more you need to study for to pass this exam, as I mentioned previously Cloudwatch, VPC and IAM feature heavily so combined with the knowledge already gained from the AWS CSA study it shouldn’t take a whole lot to bag this one.  Next for me is the Devops associate exam, I going to take some time off studying as I need to go back to Ireland for a week, but I’m looking forward to getting stuck into the Devops material on my return, good luck to all attempting the Sysops Associate exam!

AWS Certified Solutions Architect Associate – PASS!

After approximately 6 weeks of study I sat the AWS CSA Associate exam today(14/07/17), the usual exam conditions apply with the exception of a mug shot before the exam which is standard practice for VMware exams, anyway I was feeling quite well prepared as I’d been putting in 2-3 hours of study per day for the past 2 weeks.  As mentioned in my previous post I used the acloud guru training course as a base for my study, I also consulted the very wordy AWS FAQs as well as reading  countless blogs.

Whilst the acloud guru course provides a good grounding for your study and helps you get to grips with the base services offered by AWS, it’s essential to also read the AWS FAQs as well as reading other people’s blogs on exam experiences/study advice, they contain lots of good pointers that will help when it comes to sitting the exam.  I’m not going to provide links to individual blogs as I honestly can’t remember which ones I used, a quick google will return plenty of hits.

The exam was way trickier than I expected and I really had to grind the gears on a lot of the questions, there were quite a few scenario based questions that made me think hard before making a selection.  I completed all the questions in just over an hour so had plenty of time to go through the review, I really wasn’t sure if I had done enough to pass but thankfully I got through with 69%……. not a great pass but a pass all the same. I think I underestimated this exam and maybe took it a bit too lightly, I won’t be making the same mistake on subsequent AWS exams!  Up next is the AWS Certified SysOps Administrator Associate, so back to the books/webtraining/faqs/blogs for me! Good luck if you plan on sitting this exam, below is a summary of items you should include in your study, I would advise to learn them all in some depth!

 

  • Lambda
  • EC2 Container Services
  • IAM
  • VPC/Networking/VPC Peering
  • EC2 Instance types
  • Elastic Load Balancer
  • API Gateway
  • The various AWS database and use cases and if they are multi AZ or region
  • S3 Storage bucket policies/permissions
  • S3 Storage types and use cases.
  • Autoscaling
  • SQS – How it works and it’s moving parts
  • NAT Gateway
  • EBS and encryption
  • Cloudwatch
  • HSM
  • Route53
  • Kinesis
  • EMR
  • Elasticache
  • Cloudfront

AWS – Certified Solutions Architect – Associate

I’ve decided it’s time to get certified on AWS,  the plan is to start with the Architect Associate exam as I feel it will give me high level understanding of the technology, once I’ve achieved this I will drill down in to the Sysops professional qualification.  I’ve purchased the Certified Solutions Architect course from acloud.guru for a very reasonable £27, probably the cheapest training course I have ever purchased.  The guys running the site are big advocates of serverless architecture, the current AWS pricing model for Lambda offers the first 1 million requests free of charge, so for the time being their IT infrastructure costs are zero!!  They pass this cost saving on to consumers of their training which is why the courses are so cheap, so far the course has been very hands on and very enjoyable, I’m about 80% through and would highly recommend this to anyone interested in obtaining AWS certification.  They really do start from the ground up, even if you have no previous cloud experience this course will really help you get a better understanding of how everything fits together.

vCenter Orchestrator – Edit an existing Workflow

vCenter Orchestrator is quite new to me so it’s been a bit of learning curve preparing for the DCA exam, I often find the best way for me to study is to write things down so they filter through my braincells and hopefully to memory.  I will edit a very basic VCO workflow that will change the number of vCPUs of a specific VM,  I will not be going through the creation of the workflow I am only concerned with editing the attributes of an existing workflow.

1. Browse to the ‘Schema’ view of the workflow and edit the ‘Action Element’ by clicking on the pencil icon..

2.wf

2. Browse to the ‘Visual Binding’ tab.

I find the easiest way to inspect and modify the inputs and attributes is to use the visual bindings tab.

An ‘input’ is a value the user needs to enter in order to complete the workflow,  Attributes on the other hand are predefined and are not exposed to the person running the workflow, for the purpose of demonstration I have included one ‘Input’ and one ‘Attribute’.  In this example the Input will ask the user to enter the number of vcpus required for the virtual machine, the Attribute will define the virtual machine to modify.  We don’t need to do anything with the Input as it’s set up correctly but we will need to edit the attribute to select the correct virtual machine.

3.wf

3. To edit an attribute go to the ‘General’ tab.

4.wf

4. Select ‘vm1’ and click the ‘Not set’ hyperlink.

5.wf

5. Drill down to the correct virtual machine, in this case ‘CPUMod’

6.wf

6. The value should now change to the virtual machine name, at this point we don’t need to change anything else.

7.wf

 

7. Run the workflow.

9.wf

 

8. Virtual Machine before workflow is run.

10.wf

 

9. Enter number of vCPUS required and hit submit.

11.wf

10. vCPU count has been modified

12.wf

I guess the thing to remember is that attributes are edited on the ‘General’ tab, they come in different formats as shown below.

  • String – a clear text box where you can type whatever you like.
  • Boolean – True or False
  • Predefined VCO parameter – e.g. VC.Datacenter, you need to click the ‘Not set’ link to define this type of attribute.

To illustrate this further the screenshot below shows the attributes of a workflow that creates a new DataCenter and Cluster.

13.wf

 

VCAP DCA – PASS!

I didn’t actually study much since the last attempt, I think the difference this time around was a good nights sleep!  I left 2 questions unanswered (some research for me to do) but felt I answered the remaining questions well, I finished the exam with 10 minutes to spare.  I received an email from VMware the following day saying that my transcript had been updated but I’m yet to receive the official ‘Pass’ email that tells me the score, I’ll chase them up in the coming days.  So that’s it..VCAP DCD and VCAP DCA passed, I didn’t quite hit my initial target of passing both exams in 3 months, maybe it was a bit ambitious, but I’m very glad to have them both on my CV.  It’s been a great learning experience and I feel it’s broadened my knowledge, I’m still undecided if I’ll attempt the VCDX, I start a 6 month contract on Monday and expect to be kept very busy so I’ll have a think and decide in a couple of months.  Good luck to all attempting the advanced exams, there’s no doubt they are challenging but the experience and reward is well worth it!

 

VCAP-DCAsmall

 

VCAP-DCA Exam Prep

So the home Lab was deployed and the study started about 6 weeks ago, the exam is booked for the 16th June at 08:15am which is tomorrow morning.

My intention is to go through all the questions and look for the quick wins, once they are completed I will go back through the questions that require a little more thought.  This may change once I’ve clicked  the ‘Start Exam’ link and panic sets in, but I’ll do my best to stick to this strategy.  As for study materials, I’ve been using the VCAP5-DCA Official Cert Guide book which has some good scenarios at the end, I’ve also used  Joshua Andrews fantastic test track lab, a really helpful guy!  I’ve been up and down the exam blueprint many times to make sure I’ve covered each of the topics and of course lot’s and lot’s of practice in the home lab. Finding time to study has been a bit of a challenge as we recently had our second child so I don’t feel as prepared as I would have liked. I did consider re-scheduling the exam, but the next convenient slot was a month away so I decided to just go for it.  So here goes nothing……

Some really good scenario based questions can be found on the links below.

Practice Test 1

Practice Test 2

Practice Test 3

VCAP DCD PASS!

I’ve managed to pass the VCAP DCD550! It’s a really tough exam and I am VERY relieved to get over the line.  I fine tuned my studies and really honed in on the functional requirements, risks, constraints, assumptions etc….I highly recommend anyone studying for the DCD to jump over to the google+ VCAP DCD study group, there’s a wealth of information with some really helpful contributors. VCAP DCA is next on the hit list, I’m not going to achieve my initial target of completing the DCD and DCA within 3 months, mostly because I wasn’t expecting the DCD to be so damn tricky, I’m hoping to sit the DCA at the end of May.

VCAP-DCD

VCAP DCD Study – Home Lab Design Part 10

Section 4 – Implementation Planning

Objective 4.1 – Create an Execute a Validation Plan

Knowledge
Recall standard functional test areas for design and operational verification.

Covered this in earlier sections but a recap!

Functional Requirements The official definition for a functional requirement specifies what the system should do: “A requirement specifies a function that a system or component must be able to perform.” Functional requirements specify specific behavior or functions, for example: “Display the heart rate, blood pressure and temperature of a patient connected to the patient monitor.”

Typical functional requirements are:

  • Business Rules
  • Transaction corrections, adjustments, cancellations
  • Administrative functions
  • Authentication
  • Authorization –functions user is delegated to perform
  • Audit Tracking
  • External Interfaces
  • Certification Requirements
  • Reporting Requirements
  • Historical Data
  • Legal or Regulatory Requirements

Non-Functional Requirements The official definition for a non-functional requirement specifies how the system should behave: “A non-functional requirement is a statement of how a system must behave, it is a constraint upon the systems behavior.”

Non-functional requirements specify all the remaining requirements not covered by the functional requirements. They specify criteria that judge the operation of a system, rather than specific behaviors, for example: “Display of the patient’s vital signs must respond to a change in the patient’s status within 2 seconds.”

Typical non-functional requirements are:

  • Performance – Response Time, Throughput, Utilization, Static Volumetric
  • Scalability
  • Capacity
  • Availability
  • Reliability
  • Recoverability
  • Maintainability
  • Serviceability
  • Security
  • Regulatory
  • Manageability
  • Environmental
  • Data Integrity
  • Usability
  • Interoperability

Non-functional requirements specify the system’s ‘quality characteristics’ or ‘quality attributes’. Potentially many different stakeholders have an interest in getting the non-functional requirements right. This is because for many large systems the people buying the system are completely different from those who are going to use it (customers and users)

 

Differentiate between operational testing and design verification.

Good operational testing examples can be found here..

https://communities.vmware.com/docs/DOC-11418

From Brownbag notes…

Operational Testing is testing pieces of the virtual infrastructure in general

Design Verification means implementing a business goal or requirement and verifying
its accuracy with the business, that the design item(s) perform as expected and, if so, a
ccepted by the business (i.e. meeting a Compliance requirement); this may or may not be
outside of standard implementation criteria.

 

Skills and Abilities

From an existing template, choose the appropriate test areas.

Example of a test template here..

http://www.vmware.com/files/pdf/partners/09Q1_VM_Test_Plan.doc

Test vSphere features (i.e. vMotion, HA, DRS) under certain workloads to see how apps
perform.

Identify expected results

Document the results from the test plans and compare them to the current state analysis done at the start of the project.

Demonstrate an ability to track results in an organized fashion

Use health check scripts and rvtools and document and present the results.

Compare validation plan metrics to demonstrate traceability to business objectives

Compare the results to the business objectives and requirements for validation.

 

Objective 4.2 – Create an Implementation Plan

Skills and Abilities

Based on key phases of enterprise vSphere 5.x implementations, map customer development needs to a standard implementation plan template.

VMware provide a plan and design kit to partners, basically they are saying that although it is a useful tool we shouldn’t stick to it to the letter, take in to account your own business requirements and make sure the design fits them.

Evaluate customer implementation requirements and provide a customized implementation plan.

Not really sure what to say here but create an implementation that meets the customers needs.

Incorporate customer objectives into a phased implementation schedule.

Phased implementation focus areas:
Early ROI workloads
Low risk/high visibility
roi1

Match customer skills and abilities to implementation resource requirements.

The key roles for the team are listed below.

  • Relationship Manager – Act as primary interface between application owners and infrastructure groups.
  • IT Analyst – Identify impacted operational areas and recommend changes.
  • IT Infrastructure Architect – Translate requirements into architectural designs.
  • IT Infrastructure Engineer – Provide specific technical design for virtualized solutions.

The size of the team will vary depending on the scope and size of deployments, but it can be as small as three people or larger where multiple people are acting in each role. These positions should be viewed as relatively senior positions for highly regarded and skilled employees. Suitable candidates can often be found in the current organization (for example, in relationship management, IT infrastructure architecture, or server engineering groups). Once the team is in place, the team members play a central role in the deployment of projects in a virtualized environment.

Identify and correct implementation plan gaps.

Basically provide the finer detail of the implementation plans e.g. configure vswitch security settings.

Objective 4.3 – Create an Installation Guide

Knowledge

Identify standard resources required to construct an installation guide.

Use the official VMware documentation to construct the installation guides, also refer to the VMware community.
Skills and Abilities
Consider multiple product installation dependencies to create a validated configuration.

Ensure the installation guide follows a logical flow so that components are installed in the correct order.
Recognize opportunities to utilize automated procedures to optimize installation.

Auto-deploy springs to mind, nothing stopping you from using good old linux based kickstart to do a scripted installation.
Create installation documentation specific to the design.

Create a step by step installation doc, use screenshots to assist the engineer installing the components.

 

VCAP DCD Study – Home Lab Design Part 9

Objective 3.6 – Determine Data Center Management Options for a vSphere 5.x Physical Design

Knowledge

1. Differentiate and describe client access options.

  • vSphere Client
  • vSphere Web Client
  • vCLI
  • PowerCLI
  • DCUI
  • vMA

Skills and Abilities

2. Based on the service catalog and given functional requirements, for each service:
o Determine the most appropriate datacenter management options for the design.

 Management tools will depend on the skills on the operational staff running the infrastructure and will usually be decided on this basis.

o Implement the service based on the required infrastructure qualities.

Not much to say about this, but management tools should be implemented following AMPRS!

3. Analyze cluster availability requirements for HA and FT.

No brainer really.  HA should always be enabled, although I have come across a situation where we couldn’t enable it due Cisco contact centre software not supporting HA and VMotion but I would say this is a real exception.

FT will have specific use cases depending on requirements, the current vCPU limit restricts it’s usefulness but as mentioned earlier this will soon be a thing of the past. FT VMs cannot have snapshots, DRS or Storage vMotion.

Analyze cluster performance requirements for DRS and vMotion.

Be aware of  VM hardware versions,  virtual machines running on hardware version 8 can’t run on prior versions of ESX/ESXi, such virtual machines can be moved using VMware vMotion only to other ESXi 5.0 hosts.  Take into account CPU compatibility, try and keep the hardware exactly the same, if not possible then enable EVC on the cluster.

Analyze cluster storage performance requirements for SDRS and Storage vMotion.

Storage vMotion can perform up to four simultaneous disk copies per Storage vMotion operation. Storage vMotion will involve each datastore in no more than one disk copy at any one time, however. This means, for example, that moving four VMDK files from datastore A to datastore B will happen serially, but moving four VMDK files from datastores A, B, C, and D to datastores E, F, G, and H will happen in parallel.

For performance-critical Storage vMotion operations involving virtual machines with multiple VMDK files, you can use anti-affinity rules to spread the VMDK files across multiple datastores, thus ensuring simultaneous disk copies.
„
During a Storage vMotion operation, the benefits of moving to a faster data store will be seen only when the migration has completed. However, the impact of moving to a slower data store will gradually be felt as the migration progresses.
„
Storage vMotion will often have significantly better performance on VAAI-capable storage arrays.

VMware Storage vMotion performance depends strongly on the available storage infrastructure bandwidth between the  ESXi host where the virtua l machine is running and both the source and destination data stores.

During a Storage vMotion operation the virtual disk to be moved is being read from the source data store and written to the destination data store. At the same time the virtual machine continues to read from and write to the source data store while also writing to the destination data store. This additional traffic takes place on storage that might also have other I/O loads (from other virtual machines on the same ESXi host or from other hosts) that can further reduce the available bandwidth.

 

Determine the appropriate vCenter Server design and sizing requirements:
o vCenter Server Linked Mode

Using vCenter Server in Linked Mode You can join multiple vCenter Server systems using vCenter Linked Mode to allow them to share information. When a server is connected to other vCenter Server systems using Linked Mode, you can connect to that vCenter Server system and view and manage the inventories of the linked vCenter Server systems.Linked Mode uses Microsoft Active Directory Application Mode (ADAM) to store and synchronize data across multiple vCenter Server systems. ADAM is installed as part of vCenter Server installation. Each ADAM instance stores data from the vCenter Server systems in the group, including information about roles andlicenses. This information is replicated across all of the ADAM instances in the connected group to keep them in sync.

When vCenter Server systems are connected in Linked Mode, you can perform the following actions:

  • Log in simultaneously to vCenter Server systems for which you have valid credentials.
  • Search the inventories of the vCenter Server systems in the group.
  • View the inventories of the vCenter Server systems in the group in a single inventory view. So if you have multiple vCenter instances to manage different sites, for site recovery or just different locations, then vCenter Linked mode will help out with managing of all the different sites under one location

o vCenter Server Virtual Appliance

  • vCenter Linked Mode is not supported
  • vCenter Heartbeat is not supported
  • Some VMware/Third Party Plugins might not support vCSA. Check with your desired plugin vendors if they support the vCenter Appliance.
  • Installing update Manager on the vCenter Appliance is not supported, but you can still set it up on a separate Windows VM.
  • If using the embedded database you will be limited to 100 hosts and 3000 VMs, but you always can utilize an Oracle Database to be able to scale to the vCenter Maximums of 1000 hosts and 10,000 VMs.
  • MS SQL Database is currently not supported by the vCenter Server Appliance, where you can either use the built-in vPostgres (Support up to 100 hosts and 3000VMs) or you will need to use Oracle Database to scale to 1000 hosts and 10,000 VMs. If you are planning to go beyond 100 hosts and 3000VMs and Oracle database is not an option or your cup of tea then you will have to stick with the Windows version of vCenter for now.
  • It does not support the Security Support Provider Interface (SSPI),  which is a part of SSO, and  is a Microsoft Windows API used to perform authentication against NTLM or Kerberos.
  • VMware View Composer can not be installed on the vCenter appliance, but it is no longer required to install it on the same machine as vCenter and it can be installed on a different machine and then it will support vCSA.

 
o vCenter Server Heartbeat

vCenter Server Heartbeat is a Windows based service specifically designed to provide high availability protection for vCenter Server configurations without requiring any specialized hardware.

vCenter Server Heartbeat provides the following protection levels:

Server Protection – vCenter Server Heartbeat provides continuous availability to end users through a hardware failure scenario or operating system crash.
Additionally, vCenter Server Heartbeat protects the network identity of the production
server, ensuring users are provided with a replica server including server name and IP
address shares on the failure of the production server.

Network Protection –

vCenter Server Heartbeat proactively monitors the network by
polling up to three nodes to ensure that the active server is visible on the network.

Application Protection –

vCenter Server Heartbeat maintains the application environment
ensuring that applications and services stay alive on the network.

Performance Protection –

vCenter Server Heartbeat proactively monitors system
performance attributes to ensure that the system administrator is notified of
problems and can take pre-emptive action to prevent an outage.

Data Protection –

vCenter Server Heartbeat intercepts all data written by users
and applications, and maintains a copy of this data on the passive server that can
be used in the event of a failure.

vCenter Server Heartbeat provides all five protection levels continuously, ensuring
all facets of the user environment are maintained at all times, and that the network
(Principal (Public) network) continues to operate through as many failure scenarios as possible.
vCenter Server Heartbeat software is installed on a Primary server and a Secondary server.
These names refer to the physical hardware (identity) of the servers.
The Secondary server has the same domain name, same file and data structure, same network  address, and can run all the same applications an d services as the Primary server.
vCenter Server Heartbeat uses two servers with identical names and IP addresses. 

One is an active server that is visible on the Principal (Public) network and the other is a passive server that is hidden from the network but remains as a ready standby server.

Only one server name and IP address can be visible on the Principal (Public) network at any given time.

Determine appropriate access control settings, create roles and assign users to roles.

Covered on objective 2.7

Based on the logical design, identify and implement asset and configuration management technologies.

I would say that VMware are filling this space with vCAC or now referred to as vRealize Automation, it’s a huge subject way beyond the scope of my study notes.  Other products are VMware GO, VMware service manager and VMware configuration manager.

Determine appropriate host and virtual machine deployment options.

Auto Deploy more suited to larger environments that require a more agile method of host deployment. Full install methods include boot from SAN, boot from ISCSI and scripted installs using powercli or linux kickstart(basically what auto-deploy uses), use image builder to customise ESXi images.

For virtual machines they can be created from templates, P2V, V2V or you can PXE boot the VM.

Based on the logical design, identify and implement release management technologies, such as Update Manager.

Taken from the Update Manager performance and best practice document

VMware vCenter™ Update Manager (also known as VUM) provides a patch management framework for VMware vSphere®. IT administrators can use it to patch and upgrade:

  • VMware ESX and VMware ESXi™ hosts
  • VMware Tools and virtual hardware for virtual machines
  • Virtual appliances.

… …

Update Manager Server Host Deployment There are three Update Manager server host deployment models where:

  • Model 1 – vCenter Server and the Update Manager server share both a host and a database instance.
  • Model 2 –  Recommended for data centers with more than 300 virtual machines or 30 ESX/ESXi hosts. In this model, the vCenter server and the Update Manager server still share a host, but use separate database instances.
  • Model 3 – Recommended for data centers with more than 1,000 virtual machines or 100 ESX/ESXi hosts. In this model, the vCenter server and the Update Manager server run on different hosts, each with its own database instance.

… …

Performance Tips

  • Separate the Update Manager database from the vCenter database when there are 300+ virtual machines or 30+ hosts.
  • Separate both the Update Manager server and the Update Manager database from the vCenter Server system and the vCenter Server database when there are 1000+ virtual machines or 100+ hosts.
  • Make sure the Update Manager server host has at least 2GB of RAM to cache frequently used patch files in memory.
  • Allocate separate physical disks for the Update Manager patch store and the Update Manager database.

Based on the logical design identify and implement event, incident and problem management technologies.

 Borrowed from BrownBag notes.

Traditionally, approaches to each have been reactive, being proactive allows for: efficiency, agility, reliability

Need automation tools, intelligent analytics

Tools  -VMware Service Manager; vCenter Orchestrator;

http://www.vmware.com/files/pdf/services/VMware-Proactive-Incident-Whitepaper.pdf

Based on the logical design, identify and implement logging, monitoring and reporting technologies.

Most widely used ‘system’ is Alarms within vCenter, be aware if vCenter fails then you have no alerting, so also use SNMP.

Events – record of user or system actions in vCenter

Alarms – notifcations activated in response to events

Monitoring – can be done using SNMP traps, SNMP agent is embedded in ‘hostd’

Logging– best to setup a logging server; product called “Syslog Collector” can be used

Install with vCenter Server media; point to log server

VCAP DCD Study – Home Lab Design Part 8

Objective 3.5 – Determine Virtual Machine Configuration for a vSphere 5.x Physical Design
Knowledge

 

1.Describe the applicability of using an RDM or a virtual disk for a given VM.

RDMs

Only use when necessary, i.e. Microsoft Clustering, SAN agents that require direct access and for migrations, there is very little performance difference between and RDM and VMFS.

Skills and Abilities

2. Based on the service catalog and given functional requirements, for each service: Determine the most appropriate virtual machine configuration for the design.

o Implement the service based on the required infrastructure qualities.

  • Always start with only 1 vCPU
  • Enable TPS
  • Always install VMware Tools
  • Only allocate RAM needed
  • Align virtual disks
  • Remove Floppy and any unneeded I/O devices or VM Hardware
  • Paravirtual SCSI for Data disks (not OS); typically use for > 2000 IOPS
  • VMXNET3 Ethernet Adapters
  • If redirecting VM swap files, do so on Shared Storage for better vMotion performance

3. Based on an existing logical design, determine appropriate virtual disk type and placement.

 

  • Thick Provision Lazy Zeroed Creates a virtual disk in a default thick format. Space required for the virtual disk is allocated when the virtual disk is created. Data remaining on the physical device is not erased during creation, but is zeroed out on demand at a later time on first write from the virtual machine. Using the default flat virtual disk format does not zero out or eliminate the possibility of recovering deleted files or restoring old data that might be present on this allocated space. You cannot convert a flat disk to a thin disk.
  • Thick Provision Eager Zeroed A type of thick virtual disk that supports clustering features such as Fault Tolerance. Space required for the virtual disk is allocated at creation time. In contrast to the flat format, the data remaining on the physical device is zeroed out when the virtual disk is created. It might take much longer to create disks in this format than to create other types of disks.
  • Thin Provision Use this format to save storage space. For the thin disk, you provision as much datastore space as the disk would require based on the value that you enter for the disk size. However, the thin disk starts small and at first, uses only as much datastore space as the disk needs for its initial operations. NOTE If a virtual disk supports clustering solutions such as Fault Tolerance, do not make the disk thin. If the thin disk needs more space later, it can grow to its maximum capacity and occupy the entire datastore space provisioned to it. Also, you can manually convert the thin disk into a thick disk.

 

4. Size VMs appropriately according to application requirements, incorporating VMware best practices.

BrownBag notes again!

  • Start with 1 vCPU and only allocate RAM required by ISVs(Independent Software vendors)  for a given application
  • For storage, you can get this by current state analysis, then add enough for growth (patches/updates), vswp, logging, other ‘overhead’ (avg size of VMs * # VMs on Datastore) + 20% = round up the final number
  • Size VM resources in accordance with NUMA boundaries. So, if you have 4 cores, assign vCPUs by multiple of 4, 6 cores = multiple of 6, etc.
  • If overallocate RAM, more RAM overhead is used per VM
    thus wasting RAM…for larger environments that is more applicable

5. Determine appropriate reservations, shares, and limits.

Shares,Reservations, and Limits:

  • Deploy VMs with default setting unless clear reason to do otherwise
  • Use sparingly if at all!
  • Are there Apps that need resources even during contention? Then use Reservations
  • This adds complexity and administration overhead.

6. Based on an existing logical design, determine virtual hardware options.

From the performance best practice doc.

Allocate to each virtual machine only as much virtual hardware as that
virtual machine requires.

Provisioning a virtual machine with more resources than it requires can, in some cases,
reduce the performance of that virtual machine as well as other virtual machines
sharing the same host.

Disconnect or disable any physical hardware devices that you will not be using. These might include

devices such as:
„
COM ports
„
LPT ports
„
USB controllers
„
Floppy drives
„
Optical drives (that is, CD or DVD drives)
„
Network interfaces
„
Storage controllers

Disabling hardware devices (typically done in BIOS ) can free interrupt resources. Additionally, some devices, such as USB controllers, operate on a polling scheme that consumes extra CPU resources. Lastly, some PCI devices reserve blocks of memory,making that memory unavailable to ESXi.

„
Unused or unnecessary virtual hardware devices can impact performance and should be disabled. For example, Windows guest operating systems poll optical drives (that is, CD or DVD drives) quite frequently. When virtual machines are configured to use a physical drive, and multiple guest operating systems simultaneously try to access that drive, performance could suffer. This can be reduced by configuring the virtual machines to use ISO images instead of physical drives, and can be avoided entirely by disabling optical drives in virtual machines when the devices are not needed.

„
ESXi 5.5 introduces virtual hardware version 10. By creating virtual machines using this hardware version, or upgrading existing virtual machines to this version, a number of additional capabilities become available. This hardware version is not compatible with versions of ESXi prior to 5.5, however, and thus if a cluster of ESXi hosts will contain some hosts running pre-5.5 versions of ESXi, the virtual machines running on hardware version 10 will be constrained to run only on the ESXi 5.5 hosts. This could limit vMotion choices for Distributed Resource Scheduling (DRS) or Distributed Power Management (DPM)

7. Design a vApp catalog of appropriate VM offerings (e.g., templates, OVFs, vCO).

Useful for packaging applications that have dependencies, can be converted to OVF and exported.

8. Describe implications of and apply appropriate use cases for vApps.

Simplified deployment of an application for developers, can be re-packaged and converted to OVF at each stage of the SDLC.

9. Decide on the suitability of using FT or 3rd party clustering products based on application requirements.

Currently limited to 1 vCPU, but…. vSphere 6.0 announced this week so support for up to 4 vCPUs is here!!! Awesome! We’ll be seeing a lot more use cases…

From Performance best practice doc.

FT virtual machines that receive large amounts of network traffic or perform lots
of disk reads can create significant bandwidth on the NIC specified for the logging
traffic. This is true of machines that routinely do these things as well as machines doing
them only intermittently, such as during a backup operation. To avoid saturating the
network link used for logging traffic limit the number of FT virtual machines on each
host or limit disk read bandwidth and network receive band width of those virtual machines.

Make sure the FT logging traffic is carried by at least a Gigabit-rated NIC (which should in turn be connected to at least Gigabit-rated network infrastructure).

NOTE: Turning on FT for a powered-on virtual machine will also automatically “Enable FT” for that virtual machine.

Avoid placing more than four FT-enabled virtual machines on a single host. In addition to reducing the possibility of saturating the network link used for logging traffic, this also limits the number of simultaneous live-migrations needed to create new secondary virtual machines in the event of a host failure.
„
If the secondary virtual machine lags too far behind the primary (which usually happens when the primary virtual machine is CPU bound and the secondary virtual machine is not getting enough CPU cycles), the hypervisor might slow the primary to allow the secondary to catch up. The following recommendations help avoid this situation:
„
Make sure the hosts on which the primary and secondary virtual machines run are relatively closely matched, with similar CPU make, model, and frequency.
Make sure that power managementscheme settings (both in the BIOS and in ESXi) that cause CPU frequency scaling are consistent between the hosts on which the primary and secondary virtual machines run.
„
Enable CPU reservations for the primary virtual machine (which will be duplicated for the secondary virtual machine) to ensure that the secondary gets CPU cycles when it requires them.

 

10. Determine and implement an anti-virus solution

 Basically referring to vShield endpoint, there are many AV products and choosing one will come down to the requirements.

 

VCAP DCD Study – Home Lab Design Part 7

Objective 3.4 – Determine Appropriate Compute Resources for a vSphere 5.x Physical Design

Knowledge

1. Describe best practices with respect to CPU family choices.

Best practice is to stick to identical hardware across clusters, if this is not possible then EVC mode can be enabled but remember vCenter will create a baseline which may limit some features if the CPU being added to the cluster is newer than the existing CPUs, see VMware KB 1003212 for more info.

Skills and Abilities

2. Based on the service catalog and given functional requirements, for each service:

  • Determine the most appropriate compute technologies for the design.
  • Implement the service based on the required infrastructure qualities.

 

  • AMD – Vi (IOMMU) or Intel VT-d CPUs for direct I/O compatibiltiy
  • Be careful when using CPU affinity on systems with hyper-threading. Because the two logical processors share most of the processor resources, pinning vCPUs, whether from different virtual machines or from a single SMP virtual machine, to both logical processors on one core (CPUs 0 and 1, for example) could cause poor performance.

General BIOS Settings

  • Make sure you are running the latest version of the BIOS available for your system.
  • Make sure the BIOS is set to enable all populated processor sockets and to enable all cores in each socket.
  • Enable “Turbo Boost” in the BIOS if your processors support it.
  • Make sure hyper-threading is enabled in the BIOS for processors that support it.
  • Some NUMA-capable systems provide an option in the BIOS to disable NUMA by enabling node interleaving. In most cases you will get the best performance by disabling node interleaving (in other words, leaving NUMA enabled).
  • Make sure any hardware-assisted virtualization features (VT-x, AMD-V, EPT, RVI, and so on) are enabledin the BIOS.
  • Disable from within the BIOS any devices you won’t be using. This might include, for example, unneeded serial, USB, or network ports.
  • Cache prefetching mechanisms (sometimes called DPL Prefetch, Hardware Prefetcher, L2 Streaming Prefetch, or Adjacent Cache Line Prefetch) usually help performance, especially when memory access patterns are regular. When running applications that access memory randomly, however, disabling these mechanisms might result in improved performance.„
  • If the BIOS allows the memory scrubbing rate to be configured, we recommend leaving it at the manufacturer’s default setting

3. Explain the impact of a technical design on the choice of server density:

  • Scale Up
  • Scale Out
  • Auto Deploy

Scale Up

Few large servers, bigger impact if there is a failure, less management, cooling, power and heating.

Scale Out

Smaller servers, fewer VMs impacted when there is failure, scaling is more agile.

Auto Deploy

Suitable for large enterprises, requires extra infrastructure and can be more complex to manage, dependency on vCenter so needs extra consideration when designing vCenter availability.

Blade vs Server

Blades

  • less space; less I/O slots; less RAM slots; non scalable cost; more
    heating/cooling cost; vendor lockin; simpler cabling; shared chassis (SPOF?); expertise
  • Rack = more I/O & RAM slots; take up more space.

 

4. Determine a consolidation ratio based upon capacity analysis data.

Again, some of the notes borrowed from the BrownBag DCD pdf, they do a nice job of making the notes concise.

  • Cores per CPU: The number of cores per host must match or exceed the number of vCPUs of the Largest VM
  • Depending on load typically you can run 4 to 6 VMs per core on quad core socket processor
  • During current state analysis, determine total CPU and RAM required, then divide that out to the # of hosts required to meet the CPU/RAM requirement.
    This will also be based on budget,as well as what the compute ‘density’ requirement is (i.e. scale up or scale out approach), if redundancy is one of the main requirements, then a scale out approach would be better.

 

5. Calculate the number of nodes in an HA cluster based upon host failure count and resource guarantees.

Taken from the vSphere Availability PDF.

The following recommendations are best practices for vSphere HA admission control.

  • Select the Percentage of Cluster Resources Reserved admission control policy. This policy offers the most flexibility in terms of host and virtual machine sizing. When configuring this policy, choose a percentage for CPU and memory that reflects the number of host failures you want to support. For example, if you want vSphere HA to set aside resources for two host failures and have ten hosts of equal capacity in the cluster, then specify 20% (2/10).
  • Ensure that you size all cluster hosts equally. For the Host Failures Cluster Tolerates policy, an
    unbalanced cluster results in excess capacity being reserved to handle failures because vSphere HA reserves capacity for the largest hosts. For the Percentage of Cluster Resources Policy, an unbalanced cluster requires that you specify larger percentages than would otherwise be necessary to reserve enough capacity for the anticipated number of host failures.
  • If you plan to use the Host Failures Cluster Tolerates policy, try to keep virtual machine sizing requirements similar across all configured virtual machines. This policy uses slot sizes to calculate the amount of capacity needed to reserve for each virtual machine. The slot size is based on the largest reserved memory and CPU needed for any virtual machine. When you mix virtual machines of different CPU and memory requirements, the slot size calculation defaults to the largest possible, which limits consolidation.
  • If you plan to use the Specify Failover Hosts policy, decide how many host failures to support and then specify this number of hosts as failover hosts. If the cluster is unbalanced, the designated failover hosts should be at least the same size as the non-failover hosts in your cluster. This ensures that there isadequate capacity in case of failure.

Example: Admission Control Using Percentage of Cluster Resources Reserved Policy The way that Current Failover Capacity is calculated and used with this admission control policy is shown with an example. Make the following assumptions about a cluster:

The cluster is comprised of three hosts, each with a different amount of available CPU and memory resources. The first host (H1) has 9GHz of available CPU resources and 9GB of available memory, while Host 2 (H2) has 9GHz and 6GB and Host 3 (H3) has 6GHz and 6GB.
There are five powered-on virtual machines in the cluster with differing CPU and memory requirements. VM1 needs 2GHz of CPU resources and 1GB of memory, while VM2 needs 2GHz and 1GB, VM3 needs 1GHz and 2GB, VM4 needs 1GHz and 1GB, and VM5 needs 1GHz and 1GB.
The Configured Failover Capacity is set to 25%.

hosts tolerate

The total resource requirements for the powered-on virtual machines is 7GHz and 6GB. The total host resources available for virtual machines is 24GHz and 21GB. Based on this, the Current CPU Failover Capacity is 70% ((24GHz – 7GHz)/24GHz). Similarly, the Current Memory Failover Capacity is 71% ((21GB-6GB)/21GB).Because the cluster’s Configured Failover Capacity is set to 25%, 45% of the cluster’s total CPU resources and 46% of the cluster’s memory resources are still available to power on additional virtual machines.

 

6. Explain the implications of using reservations, limits, and shares on the physical design.

Shares

Resource Allocation Shares Shares specify the relative importance of a virtual machine (or resource pool). If a virtual machine has twice as many shares of a resource as another virtual machine, it is entitled to consume twice as much of that resource when these two virtual machines are competing for resources.

Shares are typically specified as High, Normal, or Low and these values specify share values with a 4:2:1 ratio, respectively. You can also select Custom to assign a specific number of shares (which expresses a proportional weight) to each virtual machine.

Specifying shares makes sense only with regard to sibling virtual machines or resource pools, that is, virtual machines or resource pools with the same parent in the resource pool hierarchy. Siblings share resources according to their relative share values, bounded by the reservation and limit. When you assign shares to a virtual machine, you always specify the priority for that virtual machine relative to other powered-on virtual machines.

The following table shows the default CPU and memory share values for a virtual machine. For resource pools, the default CPU and memory share values are the same, but must be multiplied as if the resource pool were a virtual machine with four virtual CPUs and 16 GB of memory.

share_values

For example, an SMP virtual machine with two virtual CPUs and 1GB RAM with CPU and memory shares set to Normal has 2×1000=2000 shares of CPU and 10×1024=10240 shares of memory.

NOTE Virtual machines with more than one virtual CPU are called SMP (symmetric multiprocessing) virtual machines. ESXi supports up to 32 virtual CPUs per virtual machine.

The relative priority represented by each share changes when a new virtual machine is powered on. This affects all virtual machines in the same resource pool. All of the virtual machines have the same number of virtual CPUs. Consider the following examples.

  • Two CPU-bound virtual machines run on a host with 8GHz of aggregate CPU capacity. Their CPU shares are set to Normal and get 4GHz each.
  • A third CPU-bound virtual machine is powered on. Its CPU shares value is set to High, which means it should have twice as many shares as the machines set to Normal. The new virtual machine receives 4GHz and the two other machines get only 2GHz each. The same result occurs if the user specifies a custom share value of 2000 for the third virtual machine.

Limits

Limit specifies an upper bound for CPU, memory, or storage I/O resources that can be allocated to a virtual machine.

A server can allocate more than the reservation to a virtual machine, but never allocates more than the limit, even if there are unused resources on the system. The limit is expressed in concrete units (megahertz, megabytes, or I/O operations per second).

CPU, memory, and storage I/O resource limits default to unlimited. When the memory limit is unlimited, the amount of memory configured for the virtual machine when it was created becomes its effective limit.

In most cases, it is not necessary to specify a limit. There are benefits and drawbacks:

  • Benefits – Assigning a limit is useful if you start with a small number of virtual machines and want to manage user expectations. Performance deteriorates as you add more virtual machines. You can simulate having fewer resources available by specifying a limit.
  • Drawbacks – You might waste idle resources if you specify a limit. The system does not allow virtual machines to use more resources than the limit, even when the system is underutilized and idle resources are available. Specify the limit only if you have good reasons for doing so.

Reservations

A reservation specifies the guaranteed minimum allocation for a virtual machine.

vCenter Server or ESXi allows you to power on a virtual machine only if there are enough unreserved resources to satisfy the reservation of the virtual machine. The server guarantees that amount even when the physical server is heavily loaded. The reservation is expressed in concrete units (megahertz or megabytes).

For example, assume you have 2GHz available and specify a reservation of 1GHz for VM1 and 1GHz for VM2. Now each virtual machine is guaranteed to get 1GHz if it needs it. However, if VM1 is using only 500MHz, VM2 can use 1.5GHz. Reservation defaults to 0. You can specify a reservation if you need to guarantee that the minimum required amounts of CPU or memory are always available for the virtual machine.

7. Specify the resource pool and vApp configuration based upon resource requirements.

Resource Pools

Taken from vSphere Resource Management 5.5 PDF

A resource pool is a logical abstraction for flexible management of resources. Resource pools can be grouped into hierarchies and used to hierarchically partition available CPU and memory resources.
Each standalone host and each DRS cluster has an (invisible) root resource pool that groups the resources of that host or cluster. The root resource pool does not appear because the resources of the host (or cluster) and the root resource pool are always the same.
Users can create child resource pools of the root resource pool or of any user-created child resource pool. Each child resource pool owns some of the parent’s resources and can, in turn, have a hierarchy of child resource pools to represent successively smaller units of computational capability.
A resource pool can contain child resource pools, virtual machines, or both. You can create a hierarchy of shared resources. The resource pools at a higher level are called parent resource pools. Resource pools and virtual machines that are at the same level are called siblings. The cluster itself represents the root resource pool. If you do not create child resource pools, only the root resource pools exist.
In the following example, RP-QA is the parent resource pool for RP-QA-UI. RP-Marketing and RP-QA are siblings. The three virtual machines immediately below RP-Marketing are also siblings.

rp

For each resource pool, you specify reservation, limit, shares, and whether the reservation should be expandable. The resource pool resources are then available to child resource pools and virtual machines.
VAPP
A vSphere vApp allows packaging of multiple interoperating virtual machines and software applications that you can manage as a unit and distribute in OVF format.
A vApp can contain one or more virtual machines, but any operation carried out on the vApp, such as clone or power off, affects all virtual machines in the vApp container,
From the vSphere Web Client, you can access the vApp summary page with the current status of the vApp, and you can manage the vApp.
NOTE: Because the vApp metadata resides in the vCenter Server database, a vApp can be distributed across multiple ESXi hosts. This information can be lost if the vCenter Server database is cleared or if a standalone ESXi host that contains a vApp is removed from vCenter Server. Back up your vApps to an OVF package to avoid losing metadata.
vApp metadata for virtual machines within a vApp do not follow the snapshots semantics for virtual machine configuration. vApp properties that are deleted, modified, or defined after a snapshot is taken remain intact (deleted, modified, or defined) after the virtual machine reverts to that snapshot or any prior snapshots.
You can use VMware Studio to automate the creation of ready-to-deploy vApps with pre-populated application software and operating systems. VMware Studio adds a network agent to the guest so that vApps bootstrap with minimal effort. Configuration parameters that are specified for vApps appear as OVF
properties in the vCenter Server deployment wizard. For information about VMware Studio and for download, see the VMware Studio developer page on the VMware Web site

8. Size compute resources: Memory, CPU, I/O devices, Internal storage

CPU

Plan for 60-80% utilisation, start with one vCPU and add more as required, 4-6 virtual machines per core but should be driven by the results from the discovery phase, i.e. capacity planner, perfmon, top etc…

Memory

Plan for 70-90% utilisation, take in to account virtual machine memory overhead which is determined by the amount of RAM and number of vCPUs.

I/O Devices

Minimum of four NICs on your host server. If using SCSI, use more than four NICs, note VMotion.   Assign multiple NIC’s for redundancy and increased capacity.

Taken from vSphere 5.5 performance best practices..

  • Make sure that end-to-end Fibre Channel speeds are consistent to help avoid performance problems.
  • „Configure maximum queue depth for Fibre Channel HBA cards.
  • For the best networking performance, we recommend the use of network adapters that support the following hardware features:
    • „ Checksum offload
    • „ TCP segmentation offload (TSO)
    • „ Ability to handle high-memory DMA (that is, 64-bit DMA addresses)
    • „ Ability to handle multiple Scatter Gather elements per Tx frame
    • „ Jumbo frames (JF)
    • „ Large receive offload (LRO)
  • On some 10 Gigabit Ethernet hardware network adapters, ESXi supports NetQueue, a technology that significantly improves performance of 10 Gigabit Ethernet network adapters in virtualized environments.
  • „In addition to the PCI and PCI-X bus architectures, we now have the PCI Express (PCIe) architecture. Ideally single-port 10 Gigabit Ethernet network adapters should use PCIe x8 (or higher) or PCI-X 266 and dual-port 10 Gigabit Ethernet network adapters should use PCIe x16 (or higher). There should preferably be no “bridge chip” (e.g., PCI-X to PCIe or PCIe to PCI-X) in the path to the actual Ethernet device (including any embedded bridge chip on the device itself), as these chips can reduce performance

Internal Storage

Boot device should be a minimum of 1GB,  when booting from a local disk or SAN/iSCSI LUN, a 5.2GB disk is required to allow for the creation of the VMFS volume and a 4GB scratch partition.

9. Given a constraint to use existing hardware, determine suitability of the hardware for the design.

Has to be on the VMware HCL otherwise you can run in to support issues from VMware, make sure it meets the infrastructure qualities, AMPRS!

 

VCAP DCD Study – Home Lab Design Part 6

Objective 3.3 – Create a vSphere 5.x Physical Storage Design from an Existing Logical Design

Knowledge
1. Describe selection criteria for commonly used RAID types.

The IOMEGA comes configured as RAID5 which I don’t intend to change as it gives a decent balance between performance and redundancy.

  • RAID0 = JBOD (Stripe)
  • RAID1 = mirror (data copied across both disks); can lose only 1 disk
  • RAID3 = Dedicated parity disk (min of 3 disks); can lose only 1 disk
  • RAID5 = Distributed Parity across all RAID disks; data loss potential during RAID rebuilds (min of 3 disks) ; decent Reads but Write is a division of 4 (n*IOPS/4)
  • RAID6 = Dual Parity disk distribution; n+2 during RAID rebuilds; less Reads due to 2 disks lost to Parity; Writes is less as a division of 6 (n*IOPS/6)
  • RAID1+0 = 2 disks used for striping and mirroring (min. 4 disks): best performance & most expensive; Read = sum of all disks * IOPS; Writes = ½ Read IOPS (n*IOPS/2)

A good diagram from the VMware trouble shooting storage perfomance blog

6a00d8341c328153ef0167671f883a970b-500wi

Skills and Abilities

2. Based on the service catalog and given functional requirements, for each service:

  • Determine the most appropriate storage technologies for the design.
  • Implement the service based on the required infrastructure qualities.

 I intend to use a combination of  VSAN and ISCSI storage, VSAN will come at a later date as I’m restricted by budget.  I’ll be using an IOMEGA NAS drive to present 2x 1TB datastores to the HP MicroServers, it will provide satisfactory IOPS and comes in at a good price point.

3. Create a physical storage design based on selected storage array capabilities, including but not
limited to:

  • Active/Active, Active/Passive
  • ALUA, VAAI, VASA
  • PSA (including PSPs and SATPs

Obviously I can’t apply most of this to my design but below are some things to think about if it were a real world deployment.

Multipathing policies are largely driven by the storage vendors and they should always be consulted for recommended configurations.

  • Active-active storage system Allows access to the LUNs simultaneously through all the storage ports that are available without significant performance degradation. All the paths are active at all times, unless a path fails.
  • Active-passive storage system A system in which one storage processor is actively providing access to a given LUN. The other processors act as backup for the LUN and can be actively providing access to other LUN I/O. I/O can be successfully sent only to an active port for a given LUN. If access through the active storage port fails, one of the passive storage processors can be activated by the servers accessing it.
  • Asymmetrical storage system Supports Asymmetric Logical Unit Access (ALUA). ALUA-complaint storage systems provide different levels of access per port. ALUA allows hosts to determine the states of target ports and prioritize paths. The host uses some of the active paths as primary while others as secondary

Multi-PathingPolicies:

  • Most Recently Used (MRU) — Selects the first working path, discovered at system boot time. If this path becomes unavailable, the ESX/ESXi host switches to an alternative path and continues to use the new path while it is available. This is the default policy for Logical Unit Numbers (LUNs) presented from an Active/Passive array. ESX/ESXi does not return to the previous path when if, or when, it returns; it remains on the working path until it, for any reason, fails.

Note: The preferred flag, while sometimes visible, is not applicable to the MRU pathing policy and can be disregarded.

  • Fixed (Fixed) — Uses the designated preferred path flag, if it has been configured. Otherwise, it uses the first working path discovered at system boot time. If the ESX/ESXi host cannot use the preferred path or it becomes unavailable, ESX/ESXi selects an alternative available path. The host automatically returns to the previously-defined preferred path as soon as it becomes available again. This is the default policy for LUNs presented from an Active/Active storage array.
  • Round Robin (RR) — Uses an automatic path selection rotating through all available paths, enabling the distribution of load across the configured paths. For Active/Passive storage arrays, only the paths to the active controller will used in the Round Robin policy. For Active/Active storage arrays, all paths will used in the Round Robin policy.

Note: This policy is not currently supported for Logical Units that are part of a Microsoft Cluster Service (MSCS) virtual machine.

  • Fixed path with Array Preference — The VMW_PSP_FIXED_AP policy was introduced in ESX/ESXi 4.1. It works for both Active/Active and Active/Passive storage arrays that support ALUA. This policy queries the storage array for the preferred path based on the arrays preference. If no preferred path is specified by the user, the storage array selects the preferred path based on specific criteria.

Note: The VMW_PSP_FIXED_AP policy has been removed from ESXi 5.0. For ALUA arrays in ESXi 5.0 the PSP MRU is normally selected but some storage arrays need to use Fixed.

VAAI:

  • Full copy, also called clone blocks or copy offload. Enables the storage arrays to make full copies of data within the array without having the host read and write the data. This operation reduces the time and network load when cloning virtual machines, provisioning from a template, or migrating with vMotion.
  • Block zeroing, also called write same. Enables storage arrays to zero out a large number of blocks to provide newly allocated storage, free of previously written data. This operation reduces the time and network load when creating virtual machines and formatting virtual disks.
  • Hardware assisted locking, also called atomic test and set (ATS). Supports discrete virtual machine locking without use of SCSI reservations. This operation allows disk locking per sector, instead of the entire LUN as with SCSI reservations.
  • Array thin provision, help to monitor space use on thin-provisioned storage arrays to prevent out-of-space conditions, and to perform space reclamation, space reclamation is a manual process and needs to be run for the ESXi CLI.

VASA

Storage systems that use vStorage APIs for Storage Awareness, also called VASA, are represented by storage providers. Storage providers inform vCenter Server about specific storage devices, and present characteristics of the devices and datastores deployed on the devices as storage capabilities. Such storage capabilities are system-defined and vendor specific.
A storage system can advertise multiple capabilities. The capabilities are grouped into one or more capability profile. Capabilities outline the quality of service that the storage system can deliver. They guarantee that the storage system can provide a specific set of characteristics for capacity, performance, availability, redundancy, and so on.
Vendor specific capabilities appear in the Storage Policy-Based Management system. When you create a storage policy for your virtual machine, you reference these vendor specific storage capabilities, so that your virtual machine is placed on the datastore with these capabilities.

PSA – collection of APIs to allow 3rd party ISVs to design their own load balance/failover techniques

PSPs – I/O path selection; MRU (default for A/P), Fixed (default for A/A), RR (either)

Storage Array Type Plug-Ins (SATPs) run in conjunction with the VMware NMP and are responsible for
array-specific operations.

ESXi offers a SATP for every type of array that VMware supports. It also provides default SATPs that support non-specific active-active and ALUA storage arrays, and the local SATP for direct-attached devices.
Each SATP accommodates special characteristics of a certain class of storage arrays and can perform the array-specific operations required to detect path state and to activate an inactive path. As a result, the NMP module itself can work with multiple storage arrays without having to be aware of the storage device specifics.

After the NMP determines which SATP to use for a specific storage device and associates the SATP with the physical paths for that storage device, the SATP implements the tasks that include the following:

Monitors the health of each physical path.

Reports changes in the state of each physical path.

Performs array-specific actions necessary for storage fail-over. For example, for active-passive devices, it can activate passive paths.

 

4. Identify proper combination of media and port criteria for given end-to-end performance requirements.

Refers to tiered storage based on performance type, e.g.

Gold = SSD

Silver = FC 15k SAS

Bronze = 7K Sata

 

5. Specify the type of zoning that conforms to best practices and documentation.

With ESXi hosts, use a single-initiator zoning or a single-initiator-single-target zoning. The latter is a preferred zoning practice. Using the more restrictive zoning prevents problems and misconfigurations that can occur on the SAN.

Zoning not only prevents a host from unauthorized access of storage assets, but it also stops undesired host-to-host communication and fabric-wide Registered State Change Notification (RSCN) disruptions. RSCNs are managed by the fabric Name Server and notify end devices of events in the fabric, such as a storage node or a switch going offline. Brocade isolates these notifications to only the zones that require the update, so nodes that are unaffected by the fabric change do not receive the RSCN. This is important for non-disruptive fabric operations, because RSCNs have the potential to disrupt storage traffic.

There are two types of Zoning identification: port World Wide Name (pWWN) and Domain,Port (D,P). You can assign aliases to both pWWN and D,P identifiers for easier management. The pWWN, the D,P, or a combination of both can be used in a zone configuration or even in a single zone. pWWN identification uses a globally unique identifier built into storage and host interfaces. Interfaces also have node World Wide Names (nWWNs). As their names imply, pWWN refers to the port on the device, while nWWN refers to the overall device. For example, a dual-port HBA has one nWWN and two pWWNs. Always use pWWN identification instead of nWWN, since a pWWN precisely identifies the host or storage that needs to be zoned.

6. Based on service level requirements utilize VMware technologies, including but not limited to:

Storage I/O Control
Storage Policies
Storage vMotion
Storage DRS

Storage I/O Resource Allocation

VMware vSphere provides mechanisms to dynamically allocate storage I/O resources, allowing critical workloads to maintain their performance even during peak load periods when there is contention for I/O resources. This allocation can be performed at the level of the individual host or for an entire datastore. Both methods are described below.
„
The storage I/O resources available to an ESXi host can be proportionally allocated to the virtual machines running on that host by using the vSphere Client to set disk shares for the virtual machines (select edit, virtual machine settings, choose the Resources tab, select Disk, then change the Shares field).
„
The maximum storage I/O resources available to each vi rtual machine can be set using limits. These limits, set in I/O operations per second (IOPS), can be used to provide strict isolation and control on certain workloads. By default, these are set to unlimited. When set to any other value, ESXi enforces the limits even if the underlying datastores are not fully utilized.
„
An entire datastore’s I/O resources can be proportiona lly allocated to the virtual machines accessing that datastore using Storage I/O Control (SIOC). When enabled, SIOC evaluates the disk share values set for all virtual machines accessing a datastore and allocates that datastore’s resources accordingly. SIOC can be enabled using the vSphere Client (select a datastore, choose the Configuration tab, click Properties…(at the far right), then under Storage I/O Control add a checkmark to the Enabled box).

With SIOC disabled (the default), all hosts accessing a datastore get an equal portion of that datastore’s resources. Share values determine only how each host’s portion is divided amongst its virtual machines.

Storage Policies
Formerly called virtual machine storage profiles, to ensure that virtual machines are placed to storage that guarantees a specific level of capacity, performance, availability, redundancy, and so on. When you define a storage policy,  you specify storage requirements for applications that would run on virtual machines. After you apply this storage  policy to a virtual machine, the virtual machine is placed to a specific datastore that can satisfy the storage requirements

Storage vMotion
used for no downtime for datastore maintenance; transitioning to new
Array; datastore load balancing (SDRS)

Storage DRS
A feature that provides I/O load balancing across datastores within a datastore cluster.
This load balancing can avoid storage performance bottlenecks or address them if they occur

7. Determine use case for virtual storage appliances, including the vSphere Storage Appliance.

 VSA provides High Availability and automation capabilities of vSphere to any small environment without shared storage hardware. Get business continuity for all your applications, eliminate planned downtime due to server maintenance, and use policies to prioritize resources for your most important applications. VSA enables you to do all this, without shared storage hardware.

Don’t understand why VSAN isn’t mentioned in the Blueprint??

8. Given the functional requirements, size the storage for capacity, availability and performance,
including:

Virtual Storage (Datastores, RDMs, Virtual Disks)
Physical Storage (LUNs, Storage Tiering)

Some of the info below was borrowed from the Brownbag VCAP DCD Study notes PDF.

Take I/O metrics of guests (VDI) and server workloads. Take into acct disk type & write
penalty for RAID Type

  • Capacity – consider overhead for snapshots, vswp, and logging
  • Availability – multiple HBAs, multipathing, multiple switches
  • Performance – enable read/write cache on SAN; enable CRBC in VDI; *NOTE: disable write
    cache if not battery backed*
  • Datastores – segregate high I/O traffic on different DSs
  • RDMs – needed for SAN based replication & tasks; required for MSCS
  • Virtual Disks – recommended; better provisioning capability over RDM; more portable;
    functional with all vSphere features
  • LUNs -ONE VMFS (DS) per LUN; can have multiple on a target or 1 per target
    Storage Tiering – based on app SLAs (SSD vs SAS vs SATA); thin provisioning

How Large a LUN?
The best way to configure a LUN for a given VMFS volume is to size for throughput first and capacity second.
That is, you should aggregate the total I/O throughput for all applications or virtual machines that might run on a given shared pool of storage; then make sure you have provisioned enough back-end disk spindles (disk array cache) and appropriate storage service to meet the requirements.
This is actually no different from what most system administrators do in a physical environment. It just requires an extra step, to consider when to consolidate a number of workloads onto a single vSphere host or onto a collection of vSphere hosts that are addressing a shared pool of storage.

Each storage vendor likely has its own recommendation for the size of a provisioned LUN, so it is best to check with the vendor. However, if the vendor’s stated optimal LUN capacity is backed with a single disk that has little or no storage array write cache, the configuration might result in low performance in a virtual environment. In this case, a better solution might be a smaller LUN striped within the storage array across many physical disks, with some write cache in the array. The RAID protection level also factors into the I/O throughput performance.
Because there is no single correct answer to the question of how large your LUNs should be for a VMFS volume, the more important question to ask is, “How long would it take one to restore the virtual machines on this datastore if it were to fail?”
The recovery time objective (RTO) is now the major consideration when deciding how large to make a VMFS datastore. This equates to how long it would take an administrator to restore all of the virtual machines residing on a single VMFS volume if there were a failure that caused data loss. With the advent of very powerful storage arrays, including Flash storage arrays, the storage performance has become less of a concern. The main concern now is how long it would take to recover from a catastrophic storage failure.

Another important question to ask is, “How does one determine whether a certain datastore is overprovisioned or underprovisioned?”
There are many performance screens and metrics that can be investigated within vCenter to monitor datastore I/O rates and latency. Monitoring these metrics is the best way to determine whether a LUN is properly sized and loaded. Because workload can vary over time, periodic tracking is an important consideration. vSphere Storage DRS, introduced in vSphere 5.0, can also be a useful feature to leverage for load balancing virtual machines across multiple datastores, from both a capacity and a performance perspective.

9. Based on the logical design, select and incorporate an appropriate storage network into the physical design:
iSCSI
NFS
FC
FCoE

  • Plan for failures
  • Connect the host and storage ports in such a way as to prevent a single point of failure from affecting redundant paths. For example, if you have a dual-attached host and each HBA accesses its storage through a different storage port, do not place both storage ports for the same server on the same Line Card or ASIC.
  • Use two power sources.
  • For host and storage layout To reduce the possibility of congestion, and maximize ease of management, connect hosts and storage port pairs to the same switch where possible.
  • Use single initiator zoning For Open Systems environments, ideally each initiator will be in a zone with a single target. However, due to the significant management overhead that this can impose, single initiator zones can contain multiple target ports but should never contain more than 16 target ports

san

 

 

VCAP DCD Study – Home Lab Design Part 5

Section 3 – Create a vSphere Physical Design from an Existing Logical Design

Objective 3.1 – Transition from a Logical Design to a vSphere 5.x Physical Design

Skills and Abilities

1. Determine and explain design decisions and options selected from the logical design.

The main drivers behind my design decisions were cost and the space requirements, this is why I opted for the HP Micro Servers and the TP-Link Smart switch, they are relatively inexpensive, energy efficient and will take up very little space.

2. Build functional requirements into the physical design.

One of my functional requirements is that VLAN tagging needs to be available, I opted for the Asus RT-AC68U wireless router, not only because it’s an awesome piece of kit but I intend to flash it with Tomato or DD-WRT (research ongoing)  which should allow me to enable VLAN tagging.

3. Given a logical design, create a physical design taking into account requirements, assumptions and constraints.

Nothing to add here.

4. Given the operational structure of an organization, identify the appropriate management tools and roles for each staff member.

Management tools were covered in an earlier objective e.g. VMA, Web Client, PowerCLI etc….

Below are predefined roles but new roles can be created to satisfy security requirements.No Access

  • Read Only
  • Administrator
  • Virtual Machine Power User
  • Virtual Machine User
  • Resource Pool Administrator
  • Datastore Consumer
  • Network Consumer

Objective 3.2 – Create a vSphere 5.x Physical Network Design from an Existing Logical Design

1. Describe VLAN options, including Private VLANs, with respect to virtual and physical switches.

I’ve borrowed some of the material below from the excellent BrownBag VCAP DCD Study outline as they’ve already done a great job of covering the major points.

  • VLANs – feature of both vSS & vDS; 3 types = EST, VST (default), VGT
  • PVLANs – vDS capability (virtual); Primary = Promiscuous, Secondary =Community, Isolated
  • pSwitches = need Trunk port(s) configured; if possible, enable LinkState and disable Native VLAN mode

2. Describe switch-specific settings for ESXi-facing ports, including but not limited to:

  • STP
  • Jumbo Frames
  • Load-balancing
  • Trunking
  • STP – disable on physical switch
  • Jumbo Frames – enable end to end on storage/network path.
  • Load- Balancing NIC Teaming; Route based on Originating Virtual Port ID
  • Trunking – VLAN Tagging;enable on physical switch when using VLAN

3. Describe network redundancy considerations at each individual component level.

  • Management network – utilize active/standby vmnics (pNICs)
  • 2 vSwitches & 2 Mgmt Netwks (1 on ea vSwitch) OR, 1 Mgmt Netwk with 2 pNICs
  • Dual physical switches
  • Multiple pNICs within hosts
  • Multipathing for storage (HBAs)

4. Cite virtual switch security policies and settings

Settings

  • Failback = yes; mitigates false positive of a phys switch being active when it’s still down
  • Notify Switches = yes
  • VM Network traffic – configure pNICs for Port Group in Active/Active

Security Policies:

  • IP Storage: segregate from VM traffic using VLANs; NFS export (/etc/export); iSCSI CHAP
  • MAC Address Change – REJECT; if using iSCSI set to ACCEPT
  • Forged Transmits – REJECT; prevents MAC impersonationPromiscuous Mode – REJECT
  • IPSec – authentication & encryption on packets
  • Disable native VLAN use on pSwitches to prevent VLAN hopping

Skills and Abilities

5. Based on the service catalog and given functional requirements, for each service:

  • Determine the most appropriate networking technologies for the design.
  • Implement the service based on the required infrastructure qualities (AMPRS).
  • vSS vs vDS – small or relatively large infrastructure? – In my case I will be using a hybrid solution of vSS for management, vMotion and FT and the vDS for VM and NFS traffic.
  • VLANs or not -meet compliance or SLAs by segregating traffic, I will be using VLANs.
  • .IP Storage? – jumbo frames configured, I intend to use NFS based storage but will not be enabling jumbo frames as my switch does not support it.
  • M y networking will be 1GbE

switch1

6. Determine and explain the selected network teaming and failover solution.

  • Default Team = Route based on originating virtual port ID ; also, originating MAC; IP Hash, I’ll be using route based on originating virtual port ID
  • Default Failover = Use Explicit Failover

vswitch0

vswitchport

dvs

 

7. Implement logical Trust Zones using network security/firewall technologies.

This was covered in the security section.

8. Based on service level requirements, determine appropriate network performance characteristics.

Taken from VMware vDS best practices.

2012-08-type-of-net-traffic

 

2012-08-nioc_example

9. Given a current network configuration as well as technical requirements and constraints,  determine the appropriate virtual switch solution:

  • vSphere Standard Switch
  • vSphere Distributed Switch
  • Third-party solutions (ex. Nexus 1000V)
  • Hybrid solution
  • vSS -used for smaller environments
  • vDS -easier mgmt/administration; centralized; larger environments; req’s Ent+
  • 3rd Party (Nexus 1000v) -considerations needed on what is supported (i.e. vShield, iSCSI, Host Profiles, AppSpeed, vDR, Multipathing – no support DPM, -no support, SRM
  • Hybrid – used so connectivity can be continued if vCener goes down (needed for vDS); when mixing ESX (w/Serv Cons) & ESXi (Mgmt Netwks)
  • Cisco PDF listing feature comparison of vSS, vDS, &Nexus http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9902/solution_overview_c22-526262.pdf; based on business requirements, you can compare the switches and determine which is best (budget may be a constraint for purchase of 3rd party switch as well as vSphere Edition needed

10. Based on an existing logical design, determine appropriate host networking resources.

Based on requirements, budget, constraints etc etc…

11. Properly apply converged networking considering VMware best practices.

  •  Using 10GbE cards and consolidating traffic on 1 card, using 2nd for redundancy
  • Recommended (if licensed for it) to use NIOC (on vDS) for QoS on traffic type

VCAP DCD Study – Home Lab Design Part 4

Objective 2.5 – Build Performance Requirements into the Logical Design

Knowledge
1. Understand what logical performance services are provided by VMware solutions.

Memory

  • Transparent Page Sharing – Shares identical memory pages across multiple VMs. This is enabled by default. Consideration should be given to try and place similar workloads on the same hosts to gain maximum benefit.
  • Memory Ballooning – Controls a balloon driver which is running inside each VM. When the physical host runs out of memory it instructs the driver to inflate by allocating inactive physical pages. The ESXi host can uses these pages to fulfill the demand from other VMs.
  • Memory Compression – Prior to swapping, memory pages out to physical disks. The ESXi server starts to compress pages. Compared to swapping, compression can improve the overall performance in an memory over commitment scenario.
  • Swapping – As the last resort, ESXi will start to swap pages out to physical disk
  • Caching – Allows the use of SSD drives to act as a Cache quicker than using spinning disks

Disk

  • vStorage APIs for Array Integration (VAAI) –  is a feature introduced in ESXi/ESX 4.1 that provides hardware acceleration functionality. It enables your host to offload specific virtual machine and storage management operations to compliant storage hardware. With the storage hardware assistance, your host performs these operations faster and consumes less CPU, memory, and storage fabric bandwidth.
  • Storage I/O Control (SIOC) – was introduced in vSphere 4.1 and allows for cluster wide control of disk resources. The primary aim is to prevent a single VM on a single ESX host from hogging all the I/O bandwidth to a shared datastore. An example could be a low priority VM which runs a data mining type application impacting the performance of other more important business VMs sharing the same datastore.
  • vSphere Storage API’s – Storage Awareness (VASA) – VASA is a set of APIs that permits storage arrays to integrate with vCenter for management functionality

Network

  • Network IO Control (NIOC) – When network I/O control is enabled, distributed switch traffic is divided into the following predefined network resource pools: Fault Tolerance traffic, iSCSI traffic, vMotion traffic, management traffic, vSphere Replication (VR) traffic, NFS traffic, and virtual machine traffic.  You can control the bandwidth each network resource pool is given by setting the physical adapter shares and host limit for each network resource pool.

2. Identify and differentiate infrastructure qualities (Availability, Manageability, Performance, Recoverability, Security)

See Objective 2.3

3. List the key performance indicators for resource utilization.

Performance KPI’s will be Processor, Memory, Disk, and Network.

Skills and Abilities
4. Analyze current performance, identify and address gaps when building the logical design.

This should be done during the current state analysis using well documented tools such as capacity planner as well as OS tools such as perfmon and top.

5. Using a conceptual design, create a logical design that meets performance requirements.

I don’t need to use any tiered storage or resource pools in my design,  this objective is asking us to create a logical diagram to depict the performance requirements, so for example if the database needed a high amount of IOPS and the Dev servers need  lower IOPS then I would draw up a logical diagram to show the different tiers of storage and group VMs on the relevant tiers.

6. Identify performance-related functional requirements based on given non-functional requirements and service dependencies.

The non-functional requirement is I can only spend £300 on storage, this will limit my choices, so depending on what type (SSD(ha ha!) or SATA)  and how many disks I buy, I will be limited to a certain amount of IOPS.

 

7. Define capacity management practices and create a capacity plan.

Ability to utilize resources efficiently without compromising performance
Uitlizing tools to forecast resource capacity (being proactive instead of reactive)

8. Incorporate scalability requirements into the logical design.

Overprovision enough for future growth, I’ve over provisioned for my initial requirements.

9. Determine performance component of SLAs and service level management processes.

Business Capacity Mgmt

  • ensure future business requirements are understood & have sufficient capacity to meet the requirements

Service Capacity Mgmt

  • resource consumption, activity patterns/peaks/troughs of live operational services

Component Capacity Mgmt

  • performance & capacity of underlying IT service components (CPU, RAM, Disks, etc..)

 

Objective 2.6 – Build Recoverability Requirements into the Logical Design

Knowledge
1.Understand what recoverability services are provided by VMware solutions.

FT
HA
SRM
vDR
APIs needed for 3rd party solutions

2.Identify and differentiate infrastructure qualities (Availability, Manageability, Performance, Recoverability, Security)

See Objective 2.3

3.Differentiate Business Continuity and Disaster Recovery concepts.

Business continuity is a proactive action focused on avoiding or mitigating the impacts of risks before they happen.

Below points borrowed from the Brownbag VCAP DCD Study notes.

  • The business must continue to operate for weeks, months and years
  • Who, What, Where and When is needed
  • Not just technical, whole of business
  • Very Strategic

Disaster recovery is focused on how to return services after an outage or failure has occurred which is a reactive action.

  • We hoped it would never happen but it has
  • Get the business running again ASAP
  • Tactical, Technical

4. Describe and differentiate between RTO and RPO

RTO – recovery time objective; appropriate time allowed to recover a critical system.

RPO recovery point objective; appropriate recovery point of a system, determining what is ‘acceptable’ data loss.

Skills and Abilities

5.Given specific RTO and RPO requirements, build these requirements into the logical design.

Taking into account the RTO and RPO requirements what options do we have to implement a DR solution? Array based or vSphere replication? Is there network bandwidth for replication is there budget? etc etc…

6. Given recoverability requirements, identify the services that will be impacted and provide a recovery plan for impacted services.

Basically come up with a good DR plan with a detailed run book.

7.Given specific regulatory compliance requirements, build these requirements into the logical design.

Backups & retention periods can be defined by regulation.

8.Based on customer requirements, identify applicable site failure / site recovery use cases.

How will the DR site be configured? Will we use a cloud based DR? Maybe use the failover site as a dual purpose site e.g. have pre-prod workloads running in there as well as DR.

9.Determine recoverability component of SLAs and service level management processes.

Taken from the pdf on the blueprint – Practical Guide to Business Continuity and Disaster Recovery with VMware Infrastructure

In a real-world scenario, there would be an interaction with the business owners to establish SLAs and these would drive design considerations. The implementation outlined in this VMbook was designed to apply generically to as many cases as possible and was based in part on interviews with senior architects within the VMware customer base to determine a “level set” in terms of needs,requirements, and so on. Typical questions asked of these architects include the following:

What type of SLAs do you have with the business?
Recovery Point Objectives
Recovery Time Objectives

BCDR plans have traditionally been documented as runbooks – i.e., what to do if disaster strikes. Increasingly, this runbook is being automated to make the process more predictable and less prone to error. The ability to test this plan is also a key consideration.

10. Based on customer requirements, create a data retention policy.

Retention Policy – Data Recovery backups are preserved for a variable period of time. You can choose to keep more or fewer backups for a longer or shorter period of time. Keeping more backups consumes more disk space, but also provides more points in time to which you can restore virtual machines. As backups age, some are automatically deleted to make room for new backups. You can use a predefined retention policy or create a custom policy.

 

Objective 2.7 – Build Security Requirements into the Logical Design

Knowledge
1. Understand what security services are provided by VMware solutions.

  • VMware compliance checkers
  • vShield
  • Hardening guides for the relevant ESXi version

2. Identify and differentiate infrastructure qualities (Availability, Manageability, Performance, Recoverability, Security).

Covered in objective 2.3 ( I see a pattern here!)

3. Describe layered security considerations, including but not limited to Trust Zones.

Trust zones such as a DMZ, Departmental , PCI compliance, or application (3 tier app), there are three trust zone configurations: Partially Separated Physical; Partially Separated Virtual; Fully collapsed.

Can be implemented using VLANs, Firewalls, Anit-Virus, end point appliances, IDS.

Skills and Abilities

4. Identify required roles, create a role-based access model and map roles to services.

Use active directory for all access with the exception of a local admin group in case of active directory failure.

  • Where possible, grant permissions to groups rather than individual users.
  • Grant permissions only where needed. Using the minimum number of permissions makes it easier to understand and manage your permissions structure.
  • If you assign a restrictive role to a group, check that the group does not contain the Administrator user or other users with administrative privileges. Otherwise, you could unintentionally restrict administrators’ privileges in parts of the inventory hierarchy where you have assigned that group the restrictive role.
  • Use folders to group objects to correspond to the differing permissions you want to grant for them.
  • Use caution when granting a permission at the root vCenter Server level. Users with permissions at the root level have access to global data on vCenter Server, such as roles, custom attributes, vCenter Server settings, and licenses. Changes to licenses and roles propagate to all vCenter Server systems in a Linked Mode group, even if the user does not have permissions on all of the vCenter Server systems in the group.

5. Create a security policy based on existing security requirements and IT governance practices.

This is talking about security compliance policies, change policies, patching policies, configuration policies and access control.

6. Incorporate customer risk tolerance into the security policy.

I guess depending on the industry the risk tolerance can vary, for example a travel agency’s IT security policy would not be as stringent as say a company providing IT services for the military.

7. Given security requirements, assess the services that will be impacted and create an access management plan.

Not entirely sure what this is asking or referring to but assuming it’s talking about external access to secure services, will do some more digging on this one, update to follow.

8. Given a regulatory requirement example, determine the proper security solution that would comply with it.

e.g. PCI compliance or IL3 compliance, ensuring the design caters for the specific requirements and everything will come back clean if there was an audit.

9. Based upon a specified security requirement, analyze the current state for areas of compliance/non-compliance.

referring to VMware vCenter Configuration Manager which has compliance checker integrated in to the product.

10. Explain how compliance requirements will impact the logical security design

Compliance could involve purchasing specific software to meet the requirements such as vShield endpoint of Juniper virtual gateway, also extra firewalls, switches etc if physical segregation is essential.

VCAP DCD Study – Home Lab Design Part 3

Objective 2.3 – Build Availability Requirements into the Logical Design

Knowledge
1. Understand what logical availability services are provided by VMware solutions.

I’ll be utilising VMware HA and possibly Fault Tolerance in the design.

2. Identify and differentiate infrastructure qualities (Availability, Manageability, Performance, Recoverability, Security)

  • Availability is the ability of a system or service to perform its required function when required. It is usually calculated as a percentage like 99,9%.
  • Manageability describes the expense of running the system. If you have a huge platform that is managed by a tiny team the operational costs are very low.
  • Performance is the measure of what is delivered by a system. This accomplishment is usually measured against known standards of speed completeness and speed.
  • Recoverability describes the ability to return a system or service to a working state. This is usually required after a system failure and repair.
  • Security is the process of ensuring that services are used in an appropriate way.

3. Describe the concept of redundancy and the risks associated with single points of failure.

There will be some redundancy built in to my design but not at the level the exam blueprint requires, for the purpose of study below is some info from the link provided in the blue print.

Design Principles for High Availability

The key to architecting a highly available computing environment is to eliminate single points of failure. With the potential of occurring anywhere in the environment, failures can affect both hardware and software. Building redundancy at vulnerable points helps reduce or eliminate downtime caused by [implied] hardware failures. These include redundancies at the following layers:

  • Server components such as network adaptors and host bus adaptors (HBAs)
  • Servers, including blades and blade chassis
  • Networking components
  • Storage arrays and storage networking

4. Differentiate Business Continuity and Disaster Recovery concepts.

Business continuity is a proactive action focused on avoiding or mitigating the impacts of risks before they happen.

  • The business must continue to operate for weeks, months and years
  • Who, What, Where and When is needed
  • Not just technical, whole of business
  • Very Strategic

Disaster recovery is focused on how to return services after an outage or failure has occurred which is a reactive action.

  • We hoped it would never happen but it has
  • Get the business running again ASAP
  • Tactical, Technical

Skills and Abilities

5. Determine availability component of service level agreements (SLAs) and service leve management processes.

Define an SLA for each and design a setup that will accommodate. For example if your SLA for a certain VM failure is 0, then configure that VM for FT. Or if your SLA is a couple minutes then VMware HA should be good enough. If there are other services that you commit to (i.e. performance) then create storage tiers as necessary

6. Explain availability solutions for a logical design based on customer requirements.

As mentioned I won’t be designing  a DR solution, but below is an example of a logical design of an availability solution using SRM.

dr diag

7. Define an availability plan, including maintenance processes.

This was taken from the link provided in the blue print.

VMware vSphere makes it possible to reduce both planned and unplanned downtime without the cost and complexity of alternative solutions. Organizations using VMware can slash planned downtime by eliminating most scheduled downtime for hardware maintenance. VMware VMotion™ technology, VMware Distributed Resource Scheduler (DRS) maintenance mode, and VMware Storage VMotion™ make it possible to move running workloads from one physical server to another without downtime or service interruption, enabling zero-downtime hardware maintenance.

Depending on what type of failure you are defining a plan for, do it properly. For SRM create an appropriate Run book. This will be used during a site failure. For host upgrades, make a plan to vMotion all the VMs and ensure there are available resources for all the VMs with one host down, then update the host. For VM maintenance take a snapshot and then revert back if the VM upgrade didn’t go well.

8. Prioritize each service in the Service Catalog according to availability requirements.

Using VMware HA set the reboot priority depending on the availability requirements. Most important Services/VMs can have the highest priority during an HA failover

VM Restart Priority Setting

VM restart priority determines the relative order in which virtual machines are restarted after a host failure.Such virtual machines are restarted sequentially on new hosts, with the highest priority virtual machines first and continuing to those with lower priority until all virtual machines are restarted or no more cluster resources are available.

9. Balance availability requirements with other infrastructure qualities

VMware also helps protect against unplanned downtime from common failures, including:

Network and storage interface failures. Support for redundant network and storage interfaces is built into VMware ESX™. Redundant network and storage interface cards can be shared by multiple virtual machines on a server, reducing the cost of implementing redundancy. VMware virtualization also makes it easy to create redundant servers without additional hardware purchases by allowing for the provisioning of virtual machines to existing underutilized servers.

Server failures. VMware High Availability (HA) and VMware Fault Tolerance deliver protection against server failures without the cost and complexity often associated with implementing and maintaining traditional solutions. VMware HA automatically restarts virtual machines affected by server failures on other servers to reduce downtime from such failures to minutes, while VMware Fault Tolerance ensures continuous availability for virtual machines by using VMware vLockstep technology to create a live shadow instance of a virtual machine on another server and allow instantaneous, stateful failover between the two instances.

Overloaded servers. VMware VMotion, VMware Distributed Resource Scheduler (DRS), and VMware Storage VMotion help you to proactively balance workloads across a pool of servers and storage.

Objective 2.4 – Build Manageability Requirements into the Logical Design

Knowledge

1. Understand what management services are provided by VMware solutions.

Not an exhaustive list and I will only be using a few of them in my lab.

vMA, vCenter, PowerCLI, vCLI, vCenter Orchestrator, vSphere API, vSphere HA, vSphere DRS, Auto Deploy, Scheduled Tasks, Host Profiles.

2. Identify and differentiate infrastructure qualities (Availability, Manageability, Performance, Recoverability, Security)

Already covered this in objective 2.3

Skills and Abilities

3. Build interfaces to existing operations practices into the logical design

This is talking about integrating existing services such as an existing Database or Active Directory in to the logical desgin, obviously I can’t apply this to my design.

4. Address identified operational readiness deficiencies

Again I can’t apply this to my design but it’s referring to issues that we’re picked up during the discovery phase that need to be fixed as part of the new design.

5. Define Event, Incident and Problem Management practices

ITIL Definitions

  • Event – A Change of state which might have an influence for the management of a service or system
  • Incident – An event which is not part of standard operation and usually causes a service disruption to degrade functionality
  • Problem – The cause of one or more incidents

6. Define Release Management practices

ITIL Definition

Release Management encompasses the planning, design, build, configuration and testing of hardware and software releases to create a defined set of release components.

The goal of the Release and Deployment Management process is to assemble and position all aspects of services into production and establish effective use of new or changed services.
Effective release and deployment delivers significant business value by delivering changes at optimized speed, risk and cost, and offering a consistent, appropriate and auditable implementation of usable and useful business services.
Release and Deployment Management covers the whole assembly and implementation of new/changed services for operational use, from release planning through to early life support

7. Determine Request Fulfillment processes

More stuff from ITIL

Each catalog item uses a fulfillment process, to define the request fulfillment process when that item is ordered.

Fulfillment processes are used when ordering standard catalog items, but are not used for some extended types of catalog item, such as content items.

8. Design Service Asset and Configuration Management (CMDB) systems

  • SACM supports the business by providing accurate information and control across all assets and relationships that make up an organization’s infrastructure.
  • The purpose of SACM is to identify, control and account for service assets and configuration items (CI), protecting and ensuring their integrity across the service lifecycle.
  • The scope of SACM also extends to non-IT assets and to internal and external service providers, where shared assets need to be controlled.
  • To manage large and complex IT services and infrastructures, SACM requiresthe use of a supporting system known as the Configuration Management System (CMS)

9. Define Change Management processes

Change management is an IT service management discipline. The objective of change management in this context is to ensure that standardized methods and procedures are used for efficient and prompt handling of all changes to control IT infrastructure, in order to minimize the number and impact of any related incidents upon service. Changes in the IT infrastructure may arise reactively in response to problems or externally imposed requirements, e.g. legislative changes, or proactively from seeking improved efficiency and effectiveness or to enable or reflect business initiatives, or from programs, projects or service improvement initiatives. Change Management can ensure standardized methods, processes and procedures which are used for all changes, facilitate efficient and prompt handling of all changes, and maintain the proper balance between the need for change and the potential detrimental impact of changes.

10. Based on customer requirements, identify required reporting assets and processes

I’m not entirely sure what this is referring to, need to do some more research!!!

 

VCAP DCD Study – Home Lab Design Part 2

 

Section 2 – Create a vSphere Logical Design from an Existing Conceptual Design

 

Objective 2.1 –Map Business Requirements to the Logical Design

Knowledge
1.Explain the common components of logical design.

The Logical Design specifies the relationship between all components, the components of my lab design are quite straightforward and comprise of 2 wireless routers, a smart switch, 2 ESXi hosts, 1 NAS drive and some ethernet cabling.

2.List the detailed steps that go into the makeup of a common logical design.

The steps involve gathering the requirements and creating a logical diagram that visually displays what needs to be built to fulfill those requirements.

3. Differentiate functional and non-functional requirements for the design.

I  struggle with these concepts and the reference material in the blueprint doesn’t really help, it talks about heart rate and blood pressure !!!! I wish they would relate the material to the technology we are studying, it might just be the way that my brain works but I need things spelled out in black in white.  Hey VMware…give us some solid examples that we can compare to our every day working lives please.

The definition for a functional requirement specifies what the system should do: “A requirement specifies a function that a system or component must be able to perform”
Functional requirements specify specific behavior or functions, for example:

Functional Requirements

  • The lab environment must be securely accessible from external sources but only accessible to the administrator.
  • Authentication must be validated through Microsoft Active Directory and VMware SSO.
  • The Lab should be able to support at least a vCenter Server, MS SQL Express database, and active directory server and an Update Manager.
  • VLAN tagging must be available on the switching infrastructure to separate traffic types.

Non-Functional Requirements

  • The vCenter Server and Database need to have some redundancy within the cluster.
  • Costs must be kept below £1500.
  • There should be enough compute resource to cater for vCenter Server, MSSQL Express, AD and Update Manager.
  • The design must use and existing IOMEGA NAS device.
  • Hosts need to be patched on a regular basis and kept in a consistent configuration.
  • Space is at a premium so the physical footprint of the equipment needs to be a small as possible.
  • Power consumption needs to be low.

 

Skills and Abilities

4. Build non-functional requirements into a specific logical design.

Here’s a non-functional logical diagram depicting the use of the IOMEG NAS device.

 

5. Translate given business requirements and the current state of a customer environment into a logical design.

Logical Diagram to follow…..

6. Create a Service Catalog

A service catalog is introduced from ITIL and should contain the items below.

  • Service name

Home lab Support

  • Service description

Maintenance and support will be included for the following devices..

2 x HP Micro Servers with ESXi 5.5 installed

1 x Virgin SuperHub

1 x Asus RT-N66U Wireless Router

1 x TP-Link Smart Switch

1 x IOMEG NAS

  • Services included

Patch management

Upgrades

Incident support

  • Services not included

Tea and Coffee making

  • Services availability

24x7x365

 

Objective 2.2 – Map Service Dependencies

Knowledge
1. Identify basic service dependencies for infrastructure and application services.

As this is a green field site there is no need for application discovery and application mapping, if needed I can do this using VMware vCenter Application Discovery Manager.  For the purpose of the Exam blueprint below are the discovery methods.

  • Active – uses common network protocols to remotely query servers to build up an overall picture, it can put a burden on network resources and doesn’t give any relationship data. It doesn’t require agents.
  • Passive – provides more relationship data than the above active discovery, it listens and samples network traffic to see how network hosts and servers talk to each other and on what ports. Does require agents.
  • Analytics – complements the above 2 by performing deep packet analysis of observed traffic.

vCenter Server is dependent the MSSQL database and SSO

SSO is dependent on Active Directory

MSSQL is dependent on Active Directory as I’ll be using service accounts for the database.

Skills and Abilities
2. Document service relationships and dependencies (Entity Relationship Diagrams)

An application dependency diagram determines which entities are related with another. While discovering running services during the current state analysis you can use this information to draw down the upstream and downstream relationships. Relationships could be defined in the following terms:

  • runs on / runs
  • depends on / used by
  • contains / contained by
  • hosts / hosted by

 

3. Identify interfaces to existing business processes and define new business processes

Doesn’t really apply to this design but if in the real world during the discovery process if something was discovered that needed to be changed as part of the design and will impact the way the environment is to be managed, then the new process needs to be documented.

4. Given a scenario, identify logical components that have dependencies on certain services.

Covered this above.
5. Include service dependencies in a vSphere 5.x logical design.

Not really sure what to say about this, but I will be including the service dependencies in the logical design.

6. Analyze services to identify upstream and downstream service dependencies.

Everything that happens downstream can have an effect on upstream items. For example, if the SQL database crashes, the vCenter will stop. 

7. Having navigated logical components and their interdependencies, make decisions based upon all service relationships.

I’m assuming they are talking about grouping virtual machines together using VAPPs but I don’t plan to do this in my design.

So that’s the first 2 objectives of section 2 in the blue print addressed, objective 2.3 and onwards will be covered in the next post.

 

VCAP DCD Study – Home Lab Design Part 1

My objective is to design and implement a home lab using the VCAP DCD Exam Blueprint as a guide, once the home lab has been deployed I will use it for VCAP DCA study.

I will try  to address each of the skills and abilities defined in the objectives, I might not be able to relate my lab design to all of the blue print objectives, but I’ll try and cover them as best I can.  I’ll be using the blog as my study notes so it might be a bit rough around the edges, once the exam is done I’ll come back pretty everything up!  I’ve used various sources for my notes, mostly I’ve used the documents provided in the blue print, I’ve also leaned heavily on the vBrownbag notes so a BIG Thanks to Shane Williford and Cody Bunch.

Section 1 – Create a vSphere Conceptual Design

Objective 1.1 – Gather and analyze business requirements

1. Associate a stakeholder with the information that needs to be collected.

Before  a design can begin the correct information needs to be collected, this information needs to be collected and associated with particular stakeholders. In this case I am the sole stakeholder.

2. Utilize customer inventory and assessment data from a current environment to define a baseline state.

This is a green field deployment, however, if it were an existing site we would do the following…

    • Perform current state analysis with tools like VMware Capacity Planner
    • Review the current environment documentation
    • Collected information from stakeholders and SMEs

3. Analyze customer interview data to explicitly define customer objectives for a conceptual design.

  • Goals (why are we doing this and what is the time frame)
  • Scope (Whats included, whats not included)

Requirements:

  • Business (cost savings, work force reductions)
  • Technical ( Uptime, consolidation, DR)
  • Legislative ( compliance, security)
  • Assumptions ( sufficient cooling for hardware in datacenter)
  • Constraints (Must be done – re-use of existing servers or must be HP branded)
  • Risks ( could prevent the project from happening like budget not yet approved or a dependency on another project

 

4. Identify the need for and apply requirements tracking.

Requirements will be tracked in my design

5 .Given results of a requirements gathering survey, identify requirements for a conceptual design.

network reqs

storage reqs

 6. Categorize requirements by infrastructure qualities to prepare for logical design requirements
Note: There are no DR requirements for this design.
avail reqs
man reqs
perfreqs

Objective 1.2 – Gather and analyze application requirements

1. Given a scenario, gather and analyze application requirements

Application requirements are minimal for this deployment, all I require is vCenter Server, MS SQL Express and Update Manager.

2. Given a set of applications within a physical environment, determine the requirements for virtualization.

No P2V required for this project.

3. Gather information needed in order to identify application dependencies.

vCenter Server  and update Manager will be dependent on the MSSQL database.

vCenter Server will also be dependent on SSO and Active Directory as I will be using service accounts.

4. Given one or more application requirements, determine the impact of the requirements on the design.

Application requirements are nothing out of the ordinary and will not have a major impact on the overall design.

Objective 1.3 – Determine Risks, Constraints, and Assumptions

1. Differentiate between the general concepts of a risk, a requirement, a constraint, and an assumption.

risks

assums1

assums2

constraints

2. Given a statement, determine whether it is a risk, requirement, a constraint, or an assumption.

I’ve think the previous point covers this.

3. Analyze impact of VMware best practices to identified risks, constraints, and assumptions

I don’t see VMware best practices having an impact on my risks, constraints and assumptions.

VCAP DCD – My Study Approach

So I’ve decided to re-attempt the VCAP-DCD 5.5 after failing my first attempt at the 5.0/5.1 version of the exam.  It’s been well documented on various blogs that the exam is difficult to prepare for if you don’t do design work on a regular basis.  I’ve been involved in the design and deployment of many projects from small 4 host clusters to large scale cloud deployments, I’ve also been lucky enough to work with some good guys from VMware PSO as well as some very clever individuals in my current organisation, so I would like to think that I’ve had a reasonable amount of design experience.   It soon became apparent in the exam that I was lacking on the process side of things, I’m pretty much ok with the technical side but when it came to pigeon holing  risks, constraints, functional requirements etc etc… I struggled as I found it to be quite subjective and the answers could have easily fitted into several categories, anyway I ran out of time in the exam and missed it by 24 points.

This time around I’ve decided to take a slightly different approach to my study, the aim is to pass the DCD and the DCA in 3 months!  I currently don’t have a home lab as I have access to equipment at work, so using the DCD exam blue print as an architectural blue print I intend to design and deploy a home lab for the sole purpose of DCA study, killing two birds with one stone I suppose.  I’ll blog each stage of my mini project as I work through the phases.

 

 

Isolated PVLANS will not work with Cisco UCS and VMware vDS

This is something  I came up against several years ago but never got around to posting it, the workaround is to deploy a Nexus1000v, the PVLANs will be defined within the Nexus and will never traverse the upstream network.

Here’s the response from Cisco

“In a nutshell in End Host mode on the Fabric Interconnects have no unknown unicast flooding functionality and does not learn Mac addresses on the uplinks.

Because the Vmware DVS cannot terminate the PVLANs they will need to extend into the external LAN switching infrastructure.
Therefore, all community/isolated VLANs have to be defined on UCS and on the external LAN switch(es) as well.

This is fine if no communication is required between the isolated PVLAN and any external host on the Primary VLAN.

Where the design requires an external promiscuous port then you need to set the UCS Fabric Interconnects in switch mode. That is traffic that enters the promiscuous port is classified in the primary VLAN. Therefore from a UCS perspective there are no server-side MAC-table entries in the primary VLAN because servers are in an isolated PVLAN. So no communication is possible.

As such, switch mode is a must for bi-directional communication. Here the fabric interconnects will do Mac-Learning on the uplink ports as well as the server ports.”