Solving a Simple Backup Problem
vCenter Orchestrator is a bit of a dark horse in the VMware product portfolio. Almost every customer has it, because it is licensed along with every vCenter Server, but almost every customer has never touched it. It is actually a VERY powerful tool to have in your toolbox.
Of course, I’m a strong opponent against using orchestration-centric approaches to building a self-service and/or cloud environment. The two-part problem is that
- Orchestration designs, “great” as they might be, need to touch many
pieces of technology in the datacenter.
While no single element of integration may be a particular challenge,
this up-front implementation of many moving parts is costly and time-consuming
– and such solutions will typically take 6 months or more, and cost several
hundreds of thousands of dollars in services alone.
- Orchestration designs are hypersensitive to any technology changes
over time. That is, you are likely to
break the intricate machine whenever you perform a software upgrade, firmware
upgrade or hardware model change. That
is usually guaranteed to happen yearly (or half-yearly) for software, and every
few years for hardware. Multiplied by
10-15 moving parts, or more, that means the solution is never actually stable
for any length of time – or else it holds the business back from making
necessary changes.
Well, having said all that, orchestration
has its place. If it was on the food pyramid,
it might be “fats and oils”. Rich in
energy, necessary as part of a complete diet, but you have to take it easy or
else it’ll lead to heart attack, or perhaps an inability to leave your front
door! OK, so it’s not the best analogy…
but hopefully the point sticks.
Last year, a customer was seeking a way to
achieve a simple backup method, to safeguard their remote offices from VMs
breaking through software changes, updates or “tinkering”. Each site had an ESXi standalone host managed
by a central vCenter Server, local storage, and fair to poor WAN links. We decided to explore using vCenter
Orchestrator to create application consistent, on-site, self-managing
backups. What we ended up with looked
pretty useful, so I thought I would share it here. It also only took a few days for us to put
together, Peter Marfatia and I, which I thought was pretty reasonable for a
team with limited skills in the tool.
I have included a link to the resulting
package at the bottom of this article, and also the automatically generated documentation that
vCenter Orchestrator provided for me.
To get started with vCenter Orchestrator,
there are some great resources out there – some I’ve included at the bottom of
this article. It is a great learning experience
to install the Orchestrator Appliance and Client, and just look around at the
various actions, workflows and tools that it makes available.
The Overall Backup Process
Perhaps to start with, a view of the
overall process we used for this Branch Backup would help.
- The workflow is pointed to a folder within vCenter
- It discovers all virtual machines within that folder, and determines
whether they are candidates for backup, or are instances of prior backups.
- If requiring backup, it performs a snapshot with quiescing, then
clones the snapshot to a new VM, which is converted to a template (to prevent
accidental power-ons).
- If looking at prior backups, removes those no longer needed.
- When all backups have been processed, the workflow emails a report
to a nominated address.
Backup Dispatcher
This is the main entry point into the whole
workflow.
You can see that in vCenter Orchestrator,
there is a visual layout of the workflow steps, like all other orchestration
tools. Even if this is the first time
you’ve seen orchestrator, you can look at the diagrams and have a fair
understanding of what is happening when the workflow is run.
When running the job manually, the workflow
asks a small number of questions as shown below. This is the default user interface presented
by the Orchestrator Client, and others are available – check out the VMware
Labs site for some options.
Invoking the “Backup Dispatcher” workflow,
this interface is asking for the following elements.
- Email address to send the job report – listing
the VMs backed up and the success/failure results.
- Number of backups to retain – on a per
VM basis. This could be made a global
property, but we had fun playing around with different levels here. I wrote the retention logic to allow for
changes to retention, so that during periods of greater change, more backups
could be safely kept, and then scaled down later.
- Folder containing the VMs to be backed up – the workflow would collect ALL VMs from the selected folder. The management of what to backup is then a
simple drag’n’drop of any VMs into or out of this folder.
We provide a few things in the static
properties, such as mail server and content settings, but most other things are
dynamic. The static properties are
present as “Workflow Attributes” – these are essentially working (read/write)
variables that don’t act as workflow input (read only) or output (write
only). Before running the package, you
will need to have your vCenter Server registered in vCenter Orchestrator, so
that it appears within the vCO inventory and enables communication between the
two.
Below is a screenshot of the inventory, as
viewed from the vCO Client. This is one
of the excellent aspects of vCO – I can pre-configure what things are present
in my environment and not have to deal with connection strings, user
credentials, and various properties embedded in scripts. It is done just once, and then the workflows
can talk to your datacenter! If you
can’t see your vCenter inventory like this in vCenter Orchestrator, the
workflow won’t be able to do much with your environment!
In my case, I have easily connected up:
- 2 x vCenter servers
- UCS Manager
- Active Directory
- vCloud Director
- vCenter Chargeback
- A mail server
When clicking on the hyperlink to point the
workflow to a specific folder (initially has the value “Not set”), the workflow interface presents a view of the vCenter
inventory (as seen by vCO) for you to choose from, as in the screenshot below. Again, this is simpler for the workflow user,
because I had already made vCenter available to vCO using a service account
(although I could have forced a per-user connection if I wanted).
When scheduling the workflow to run on an
automatic daily cycle, you can set this parameter during the scheduling
process, in the same way as shown here. In
the case of our customer, they wanted to schedule a collection of these backup
jobs daily, each pointing to slightly different sets of VMs to backup. In the vCO Scheduler, each job entry was
provided with the distinct folders that it would manage, and the jobs just ran
thereafter without any real caretaking.
The first action of the Backup Dispatcher
is to “Get All Virtual Machines By
Folder Including Sub Folders”. This
is pretty self-explanatory, and was an action already available through the
vSphere Plugin shipped with vCO. Conveniently,
this requires only the input folder, and returns an array containing all VMs
found.
The next action is to “Sort VMs By Name”. I
implemented this as a subordinate workflow, while I was toying around with
various ways to solve a key problem – which was how to determine whether the
current backup is to be retained or not.
I wanted to ensure the workflow didn’t use any hard dates – as it
doesn’t know if it is being run weekly or daily, or ad hoc, and a whole bunch
of other “if’s” and “maybe’s” that came up while I was thinking about it. Due to the limited amount of time I wanted to
spend on it, and my rudimentary skills, I decided to name VM backups according
to a certain naming pattern, which contains:
- Original VM name
- A known delimiter, which
hopefully won’t pop-up in a normal VM name.
I chose a colon “:” in my example, after checking with the customer that
they wouldn’t expect a problem.
- The keyword “BACKUP”
- Another delimiter “:”
- A date/time stamp, in the
format yyyyMMddhhmm, such as
201306051609 – which is what my clock says as I write this.
This “Sort
VMs By Name” action merely contains a Javascript function, made with a
little help from web searching, which sorts VMs according to original name
first, and then from most recent to least recent backup. This helps later on, because the retention
policy can skip over the ones to be retained, and delete any subsequent backups
older than these. You’ll see that later
in the “Process backup and retention
logic” workflow. Anyway, the point
being that this particular point in my effort probably took the longest amount
of time as I struggled to remember anything at all about writing some
script. It just goes to show how little
I needed to know for the rest of the effort!
The next stage of the main workflow is
essentially a “for each” loop. I
implemented it as an explicit loop, just because the easy vCO “ForEach” logic control worked a little
differently than I wanted it to here – but I could probably tackle this again
in a better way. For each of the VMs in
the (now sorted) array, I submit them each to the “Process backup and retention logic” subordinate workflow. This subordinate workflow will determine
whether or not to backup the VM at hand, and if so, will return an identifier
for the backup activity.
Once all VMs have been “processed”, the
main workflow then waits for any backup activities that are being performed, using
the workflow identifiers kept from each job submission, and then sends a
report. The main framework being used
here was derived from some previous examples created by Joerg Lew. (Well, I
think it was Joerg!)
OK, so that’s the main flow, but the cool
bit is doing the live backup from a quiesced snapshot, so let’s look at that - in
a minute. First, we need to figure out
what needs backing up.
Process Backup and Retention Logic
In hindsight, this is a crappy name, but
this subordinate workflow is being called for each and every VM that was
discovered, and is trying to determine whether this is a ‘real’ virtual machine
needing backup, or if it’s a backup that might need to be removed. This part of the workflow was where I did
most of the thinking, trying different approaches that I could make work using
my rudimentary skills.
You can also see the passing of inputs and outputs for this workflow, which is an awesome visualisation of where information is flowing. It's also very easy to just click'n'drag this info around, as you're building the workflow.
Firstly, the workflow tries to separate the
discovered VM name into the three separate elements, according to the “OriginalVMname:BACKUP:201306051409” type
of format. If this is not actually a
backup, the last two elements will just be empty, of course. If we find that the current VM is a new name,
then any counters – which were being used to count the number of old backups –
need to be reset to zero. Then, if the
VM has the special name “BACKUP” in it, then the workflow only needs to determine
whether to keep it.
These decisions were based on a couple of
simple bits of Javascript logic – but you may notice that all the decisions are
being made with vCO logic elements. This
is also another easy part of vCenter Orchestrator – you drop in an “IF” logic
box, give it an input to determine a “true” or “false” choice, and then you
simply drag a connection for each choice to the next part of the workflow. Too easy.
The only smart bits I used in this whole
workflow were:
- A few Javascript bits of logic,
which I could likely replace with more readable vCO logic elements
- A call to a vCenter action to “Delete Virtual Machine”, if the
workflow has found an old backup requiring deletion.
- A function call to submit a new
vCO workflow for any VMs found that need backing up. This is done in the “Backup This VM” script action, and this is what returns the
identifier for the running backup job that is tracked later on. The workflow that we actually call is “Clone VM For Backup”, which is
described next.
Clone VM For Backup
This is a pretty straightforward bit of
work, which anybody could put together on their Day 2 exploration of vCenter
Orchestrator. It simply takes a VM as an
input, and calls the workflows already available in the vSphere Plugin.
There is one element that is a little ‘special’
here, which is the “Clone From Snapshot”
workflow. After creating a snapshot,
which is passed the “Quiesce=True”
parameter, the next piece is to clone the still-running VM from that
snapshot. This workflow was taken from
Joerg Lew from his years-old blog on this topic. This is a native capability of vSphere, and
the vSphere API, but just isn’t readily exposed through other means such as the
vSphere Client.
This workflow is also passed the parameter
to make the new clone into a template, which helps avoid accidental power-on
operations. The new template is named
according to the “OriginalVMname:BACKUP:yyyMMddhhmm”
format mentioned earlier.
The quiescing behaviour called during the
snapshot action is the native vSphere capability to invoke VSS for Windows
machines, or look for scripting stubs in Linux machines. It is then up to an application owner to
determine if any special actions might be needed to ensure application
consistency. Whatever the owner decides,
the workflow doesn’t need to worry about it.
The other “cool” thing I decided would make
sense is to “Remove All Snapshots”
once the clone has finished. This was a
clear decision that I thought would have the added benefit of ensuring
snapshots disappeared on a regular basis.
I have fielded enough urgent calls from customers who have killed their
environment because of snapshots filling up datastores. If this was deemed
undesirable, however, the workflow could be modified to only remove the
snapshot that was created in the prior step, using the available “Remove Snapshot” workflow instead. The risk here is that something unexpected
might prevent this clean up from happening one day, and the snapshot would be
effectively “forgotten”. Hence, my
decision to remove all snapshots provides a nice safeguard.
Results
At the end of the process, I can look at
the results of running this workflow in the environment. Below is a view of the job mid-flight in a
demo environment.
You can just see that the “Moodle” VM is
still being cloned and hasn’t yet been turned into a template. You can see, however, the other templates
from earlier backups having finished – in the current schedule, and also an
earlier call.
I put in a certain amount of logging in the
vCO workflows, writing to the Server.log(“__“)
very handy function call, such as the example below.
Server.log("Submitting
backup for: " + vm.name);
I have included the logging output below,
to give you an idea of what this creates.
This solution took only a few days of
playing, experimenting and learning. A
lot of the vCenter Orchestrator functionality is self-evident, or if not, it is
very comprehensively documented. The
customer was very pleased with this simple approach to solving a simple
problem, and we trod a fine line between simplicity and complexity, to ensure
the customer could easily understand the results and own it without too much
hassle.
I am certainly not suggesting that this is
a great backup strategy for your organisation, and that isn’t really the point
of sharing it here. I have used this
example of a quick and cheap solution to illuminate one way we have used
vCenter Orchestrator. There are many
other use cases that I’m sure you will find, once you discover how excellent this
tool is, that you probably already possess.
As I pointed out at the start of this
article, technical architects can get carried away with orchestration, and many
organisations build very complex systems using this approach. The temptation is certainly there. However, it is very sensitive to change. The simple example here might be robust
enough, because it is only talking to one element – vSphere. But this would quickly become an unmanageable
beast if we connected to a server platform, a storage platform, a network
manager and a firewall system – just for example. Each element either becomes frozen in time,
or else creates a risk of breaking the orchestration workflows.
The abstraction delivered by virtualization
solutions such as vCloud Director, and vCenter itself, introduces standardised,
software-based interfaces to the datacenter.
Actions can then be controlled through these software interfaces by the
tools’ native functions and policies.
This is the true value of the broader Software Defined Datacenter
architecture. For the large, complex
enterprise, orchestration is still useful and necessary “glue” from time to
time, and vCenter Orchestrator is a very powerful and friendly tool in this
capacity.
Further Areas For Expansion
Thanks to Joerg Lew and Peter Marfatia for
their contributions to putting this little solution together. I also greatly appreciate the community
leadership provided by Christophe Decanini and Burke Azbill, who contributed
plenty of knowledge and examples on the web for me to follow.
The example given here is a quick run at a
solution, and certainly has plenty of opportunities for improvement. With additional time, I would probably
replace some Javascript functions with vCenter Orchestrator logic elements,
which would make the workflow easier to understand visually, and make the
self-documentation more complete. I
would also re-visit the explicit loop I have used here, and find an elegant way
to make use of the “ForEach” construct instead.
Resources
There are a bunch of resources that I have
used over time, and that really help with getting an introduction to vCenter
Orchestrator. A couple of them are
listed below, to start you on your way.
I have also uploaded my vCenter Orchestrator package and documentation at the links below. Please feel free to use and abuse - and if you make it bugger and better, please share!
Thanks for reading!