Tuesday, June 11, 2013

The Ever-Versatile vCenter Orchestrator


Solving a Simple Backup Problem 

vCenter Orchestrator is a bit of a dark horse in the VMware product portfolio.  Almost every customer has it, because it is licensed along with every vCenter Server, but almost every customer has never touched it.  It is actually a VERY powerful tool to have in your toolbox.
Of course, I’m a strong opponent against using orchestration-centric approaches to building a self-service and/or cloud environment.  The two-part problem is that
  1. Orchestration designs, “great” as they might be, need to touch many pieces of technology in the datacenter.  While no single element of integration may be a particular challenge, this up-front implementation of many moving parts is costly and time-consuming – and such solutions will typically take 6 months or more, and cost several hundreds of thousands of dollars in services alone.  
  2. Orchestration designs are hypersensitive to any technology changes over time. That is, you are likely to break the intricate machine whenever you perform a software upgrade, firmware upgrade or hardware model change. That is usually guaranteed to happen yearly (or half-yearly) for software, and every few years for hardware. Multiplied by 10-15 moving parts, or more, that means the solution is never actually stable for any length of time – or else it holds the business back from making necessary changes.
Well, having said all that, orchestration has its place. If it was on the food pyramid, it might be “fats and oils”. Rich in energy, necessary as part of a complete diet, but you have to take it easy or else it’ll lead to heart attack, or perhaps an inability to leave your front door!  OK, so it’s not the best analogy… but hopefully the point sticks.
Last year, a customer was seeking a way to achieve a simple backup method, to safeguard their remote offices from VMs breaking through software changes, updates or “tinkering”.  Each site had an ESXi standalone host managed by a central vCenter Server, local storage, and fair to poor WAN links.  We decided to explore using vCenter Orchestrator to create application consistent, on-site, self-managing backups.  What we ended up with looked pretty useful, so I thought I would share it here.   It also only took a few days for us to put together, Peter Marfatia and I, which I thought was pretty reasonable for a team with limited skills in the tool.

I have included a link to the resulting package at the bottom of this article, and also the automatically generated documentation that vCenter Orchestrator provided for me.

To get started with vCenter Orchestrator, there are some great resources out there – some I’ve included at the bottom of this article.  It is a great learning experience to install the Orchestrator Appliance and Client, and just look around at the various actions, workflows and tools that it makes available.

The Overall Backup Process

Perhaps to start with, a view of the overall process we used for this Branch Backup would help. 
  1. The workflow is pointed to a folder within vCenter
  2. It discovers all virtual machines within that folder, and determines whether they are candidates for backup, or are instances of prior backups.
  3. If requiring backup, it performs a snapshot with quiescing, then clones the snapshot to a new VM, which is converted to a template (to prevent accidental power-ons).
  4. If looking at prior backups, removes those no longer needed.
  5. When all backups have been processed, the workflow emails a report to a nominated address.

Backup Dispatcher

This is the main entry point into the whole workflow. 


You can see that in vCenter Orchestrator, there is a visual layout of the workflow steps, like all other orchestration tools.  Even if this is the first time you’ve seen orchestrator, you can look at the diagrams and have a fair understanding of what is happening when the workflow is run.

When running the job manually, the workflow asks a small number of questions as shown below.  This is the default user interface presented by the Orchestrator Client, and others are available – check out the VMware Labs site for some options.

Invoking the “Backup Dispatcher” workflow, this interface is asking for the following elements.
  • Email address to send the job report – listing the VMs backed up and the success/failure results.
  • Number of backups to retain – on a per VM basis.  This could be made a global property, but we had fun playing around with different levels here.  I wrote the retention logic to allow for changes to retention, so that during periods of greater change, more backups could be safely kept, and then scaled down later.
  • Folder containing the VMs to be backed up – the workflow would collect ALL VMs from the selected folder.  The management of what to backup is then a simple drag’n’drop of any VMs into or out of this folder.
We provide a few things in the static properties, such as mail server and content settings, but most other things are dynamic.  The static properties are present as “Workflow Attributes” – these are essentially working (read/write) variables that don’t act as workflow input (read only) or output (write only).  Before running the package, you will need to have your vCenter Server registered in vCenter Orchestrator, so that it appears within the vCO inventory and enables communication between the two.

Below is a screenshot of the inventory, as viewed from the vCO Client.  This is one of the excellent aspects of vCO – I can pre-configure what things are present in my environment and not have to deal with connection strings, user credentials, and various properties embedded in scripts.  It is done just once, and then the workflows can talk to your datacenter!  If you can’t see your vCenter inventory like this in vCenter Orchestrator, the workflow won’t be able to do much with your environment!

In my case, I have easily connected up:
  • 2 x vCenter servers
  • UCS Manager
  • Active Directory
  • vCloud Director
  • vCenter Chargeback
  • A mail server

When clicking on the hyperlink to point the workflow to a specific folder (initially has the value “Not set”), the workflow interface presents a view of the vCenter inventory (as seen by vCO) for you to choose from, as in the screenshot below.  Again, this is simpler for the workflow user, because I had already made vCenter available to vCO using a service account (although I could have forced a per-user connection if I wanted).

When scheduling the workflow to run on an automatic daily cycle, you can set this parameter during the scheduling process, in the same way as shown here.  In the case of our customer, they wanted to schedule a collection of these backup jobs daily, each pointing to slightly different sets of VMs to backup.  In the vCO Scheduler, each job entry was provided with the distinct folders that it would manage, and the jobs just ran thereafter without any real caretaking.

The first action of the Backup Dispatcher is to “Get All Virtual Machines By Folder Including Sub Folders”.  This is pretty self-explanatory, and was an action already available through the vSphere Plugin shipped with vCO.  Conveniently, this requires only the input folder, and returns an array containing all VMs found.

The next action is to “Sort VMs By Name”.  I implemented this as a subordinate workflow, while I was toying around with various ways to solve a key problem – which was how to determine whether the current backup is to be retained or not.  I wanted to ensure the workflow didn’t use any hard dates – as it doesn’t know if it is being run weekly or daily, or ad hoc, and a whole bunch of other “if’s” and “maybe’s” that came up while I was thinking about it.  Due to the limited amount of time I wanted to spend on it, and my rudimentary skills, I decided to name VM backups according to a certain naming pattern, which contains:
  • Original VM name
  • A known delimiter, which hopefully won’t pop-up in a normal VM name.  I chose a colon “:” in my example, after checking with the customer that they wouldn’t expect a problem.
  • The keyword “BACKUP
  • Another delimiter “:”
  • A date/time stamp, in the format yyyyMMddhhmm, such as 201306051609 – which is what my clock says as I write this.
This “Sort VMs By Name” action merely contains a Javascript function, made with a little help from web searching, which sorts VMs according to original name first, and then from most recent to least recent backup.  This helps later on, because the retention policy can skip over the ones to be retained, and delete any subsequent backups older than these.  You’ll see that later in the “Process backup and retention logic” workflow.  Anyway, the point being that this particular point in my effort probably took the longest amount of time as I struggled to remember anything at all about writing some script.  It just goes to show how little I needed to know for the rest of the effort!

The next stage of the main workflow is essentially a “for each” loop.  I implemented it as an explicit loop, just because the easy vCO “ForEach” logic control worked a little differently than I wanted it to here – but I could probably tackle this again in a better way.  For each of the VMs in the (now sorted) array, I submit them each to the “Process backup and retention logic” subordinate workflow.  This subordinate workflow will determine whether or not to backup the VM at hand, and if so, will return an identifier for the backup activity.

Once all VMs have been “processed”, the main workflow then waits for any backup activities that are being performed, using the workflow identifiers kept from each job submission, and then sends a report.  The main framework being used here was derived from some previous examples created by Joerg Lew. (Well, I think it was Joerg!)

OK, so that’s the main flow, but the cool bit is doing the live backup from a quiesced snapshot, so let’s look at that - in a minute.  First, we need to figure out what needs backing up.

Process Backup and Retention Logic

In hindsight, this is a crappy name, but this subordinate workflow is being called for each and every VM that was discovered, and is trying to determine whether this is a ‘real’ virtual machine needing backup, or if it’s a backup that might need to be removed.  This part of the workflow was where I did most of the thinking, trying different approaches that I could make work using my rudimentary skills.



You can also see the passing of inputs and outputs for this workflow, which is an awesome visualisation of where information is flowing.  It's also very easy to just click'n'drag this info around, as you're building the workflow.



Firstly, the workflow tries to separate the discovered VM name into the three separate elements, according to the “OriginalVMname:BACKUP:201306051409” type of format.  If this is not actually a backup, the last two elements will just be empty, of course.  If we find that the current VM is a new name, then any counters – which were being used to count the number of old backups – need to be reset to zero.  Then, if the VM has the special name “BACKUP” in it, then the workflow only needs to determine whether to keep it. 

These decisions were based on a couple of simple bits of Javascript logic – but you may notice that all the decisions are being made with vCO logic elements.  This is also another easy part of vCenter Orchestrator – you drop in an “IF” logic box, give it an input to determine a “true” or “false” choice, and then you simply drag a connection for each choice to the next part of the workflow.  Too easy.

The only smart bits I used in this whole workflow were:

  • A few Javascript bits of logic, which I could likely replace with more readable vCO logic elements
  • A call to a vCenter action to “Delete Virtual Machine”, if the workflow has found an old backup requiring deletion.
  • A function call to submit a new vCO workflow for any VMs found that need backing up.  This is done in the “Backup This VM” script action, and this is what returns the identifier for the running backup job that is tracked later on.  The workflow that we actually call is “Clone VM For Backup”, which is described next.

Clone VM For Backup

This is a pretty straightforward bit of work, which anybody could put together on their Day 2 exploration of vCenter Orchestrator.  It simply takes a VM as an input, and calls the workflows already available in the vSphere Plugin.


There is one element that is a little ‘special’ here, which is the “Clone From Snapshot” workflow.  After creating a snapshot, which is passed the “Quiesce=True” parameter, the next piece is to clone the still-running VM from that snapshot.  This workflow was taken from Joerg Lew from his years-old blog on this topic.  This is a native capability of vSphere, and the vSphere API, but just isn’t readily exposed through other means such as the vSphere Client. 

This workflow is also passed the parameter to make the new clone into a template, which helps avoid accidental power-on operations.  The new template is named according to the “OriginalVMname:BACKUP:yyyMMddhhmm” format mentioned earlier.

The quiescing behaviour called during the snapshot action is the native vSphere capability to invoke VSS for Windows machines, or look for scripting stubs in Linux machines.  It is then up to an application owner to determine if any special actions might be needed to ensure application consistency.  Whatever the owner decides, the workflow doesn’t need to worry about it.

The other “cool” thing I decided would make sense is to “Remove All Snapshots” once the clone has finished.  This was a clear decision that I thought would have the added benefit of ensuring snapshots disappeared on a regular basis.  I have fielded enough urgent calls from customers who have killed their environment because of snapshots filling up datastores. If this was deemed undesirable, however, the workflow could be modified to only remove the snapshot that was created in the prior step, using the available “Remove Snapshot” workflow instead.  The risk here is that something unexpected might prevent this clean up from happening one day, and the snapshot would be effectively “forgotten”.  Hence, my decision to remove all snapshots provides a nice safeguard.

Results

At the end of the process, I can look at the results of running this workflow in the environment.  Below is a view of the job mid-flight in a demo environment.

You can just see that the “Moodle” VM is still being cloned and hasn’t yet been turned into a template.  You can see, however, the other templates from earlier backups having finished – in the current schedule, and also an earlier call.

I put in a certain amount of logging in the vCO workflows, writing to the Server.log(“__“) very handy function call, such as the example below.
            Server.log("Submitting backup for: " + vm.name);

I have included the logging output below, to give you an idea of what this creates.

This solution took only a few days of playing, experimenting and learning.  A lot of the vCenter Orchestrator functionality is self-evident, or if not, it is very comprehensively documented.  The customer was very pleased with this simple approach to solving a simple problem, and we trod a fine line between simplicity and complexity, to ensure the customer could easily understand the results and own it without too much hassle.

I am certainly not suggesting that this is a great backup strategy for your organisation, and that isn’t really the point of sharing it here.  I have used this example of a quick and cheap solution to illuminate one way we have used vCenter Orchestrator.  There are many other use cases that I’m sure you will find, once you discover how excellent this tool is, that you probably already possess.

As I pointed out at the start of this article, technical architects can get carried away with orchestration, and many organisations build very complex systems using this approach.  The temptation is certainly there.  However, it is very sensitive to change.  The simple example here might be robust enough, because it is only talking to one element – vSphere.  But this would quickly become an unmanageable beast if we connected to a server platform, a storage platform, a network manager and a firewall system – just for example.  Each element either becomes frozen in time, or else creates a risk of breaking the orchestration workflows.

The abstraction delivered by virtualization solutions such as vCloud Director, and vCenter itself, introduces standardised, software-based interfaces to the datacenter.  Actions can then be controlled through these software interfaces by the tools’ native functions and policies.  This is the true value of the broader Software Defined Datacenter architecture.  For the large, complex enterprise, orchestration is still useful and necessary “glue” from time to time, and vCenter Orchestrator is a very powerful and friendly tool in this capacity.

Further Areas For Expansion

Thanks to Joerg Lew and Peter Marfatia for their contributions to putting this little solution together.  I also greatly appreciate the community leadership provided by Christophe Decanini and Burke Azbill, who contributed plenty of knowledge and examples on the web for me to follow.

The example given here is a quick run at a solution, and certainly has plenty of opportunities for improvement.  With additional time, I would probably replace some Javascript functions with vCenter Orchestrator logic elements, which would make the workflow easier to understand visually, and make the self-documentation more complete.  I would also re-visit the explicit loop I have used here, and find an elegant way to make use of the “ForEach” construct instead.

Resources

There are a bunch of resources that I have used over time, and that really help with getting an introduction to vCenter Orchestrator.  A couple of them are listed below, to start you on your way.


I have also uploaded my vCenter Orchestrator package and documentation at the links below.  Please feel free to use and abuse - and if you make it bugger and better, please share!

Thanks for reading!