IT Is Power. Try Living Without A Single Breadboard For A Day.

Don MacVittie

Subscribe to Don MacVittie: eMailAlertsEmail Alerts
Get Don MacVittie: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Related Topics: Virtualization Magazine, Desktop Virtualization Journal, Virtual Application Appliances

Blog Feed Post

Thin Provisioning Plus VMs - Armageddon in a Virtual Box?

For those who don’t know, thin provisioning is the ability to tell a system it has a ton of disk


“Web Server XYZ is out of disk space” says the voice on the other end of the line in a tone that, at 2am, could pass for Scotty screaming “Capt’n! I don’t know how much longer I can ‘old her together!”

And that’s just the beginning of your nightmare, for this is exactly the scenario that you implemented thin provisioning to avoid, and if it has come to pass anyway, then you know that…

“And now server ABC is crashed!” the voice shouts, reminiscent of Scotty’s “She’s breakin’ up capt’n!” Except in your case, Scotty isn’t there to save the day. Or the night, as the case may be.

Soon you have boxes dropping all over the data center, all with disk space problems. And it’s not even 8am yet.


For those who don’t know, thin provisioning is the ability to tell a system it has a ton of disk, but have it only actually use what it needs right now. In short, a pool of disk is placed behind a collection of servers, and each are told they have a certain amount of space dedicated to them. But it isn’t really dedicated. The point of thin provisioning is to tell two or ten or two hundred machines that they each have 500 Gig, available, even if only one Terabyte is all that’s available to the lot of them. Then without certain knowledge of growth patterns on each machine, you can still guarantee that they can all consume more disk resources without problems.

This is extremely useful when you aren’t certain of storage growth patterns on a group of servers, but have an idea of the upper bounds or rate of storage usage change of those servers as a group. You can give them a chunk of disk that you know will suit their combined needs for a given time period, and not worry about them. Since we in IT are often asked to “throw up” servers that we can’t be certain of usage patterns for, this is a method of cutting risk while keeping costs contained. We won’t buy 10 Petabytes for that server unless it actually starts to use that much space. Until then it can share with some other servers the disk we can afford to put behind them.

This process absolutely reduces the risk that you will get a call around midnight that Server1 is using 100% of its disk while Server2 sits next to it and is at 5% disk utilization. It doesn’t eliminate the risk, even a completely virtualized infrastructure is going to suffer from physical resource limitations. That’s where cloud providers are supposed to help, because they (theoretically) have enough physical hardware to support more than their current possible workload. I’d check that though, their focus is on OpEx and CapEx just like yours, and less is more money.

Photo CC-by-SA Michael Moll


Most of us know what VM clones are at this point, but just to be sure we’re all on the same page, you can clone the image of a given VM so that you have an exact copy of it. There are a ton of options for how to set up the clone, but the key is that you have an exact copy. If you’re using DHCP to get the address and hostname for your server, then an exact copy is all you need. No changes required, you can boot the copy and run. Lots of applications don’t do well with DHCP, but changing a static IP/Hostname is as easy as running the clone and making the changes to the operating system, so not a big deal even if your application needs static IPs.

And that’s where our story really gets interesting. There is a lot of cloning going on out there.

Not so long ago, a storage fellow I  follow on Twitter sent out “with our product, you can make 10,000 clones in X minutes with zero storage costs!” Which caused me to give him grief about the concept that those clones were actually disk-free. Of course they’re not, clones aren’t made to sit and do nothing, they’re made to use, and in the course of use will both use disk directly, and become modified from their original layout (as with changing IP). Once the clone is no longer an exact duplicate of the original, it starts using disk indirectly too, to store the differences.

Now clones are A Great Thing(TM) because they allow you to quickly bring up another instance of a server, or to capture a server in a specific state for future reference… But many clones, well, you saw the movie, right? Many clones can be a problem. They use resources. Sometimes lots of resources. Talked to an IT person who shall remain nameless that said they had their DB in a VM and had clones of it. I’m guessing their CISO hasn’t found out about this little gem yet. Which brings us to another point, if you are making a clone so that you have a backup, or so you can move it across machines or data centers, your clone will have to be a “full copy”. Otherwise it is dependent upon access to the original, not something most of us are thinking when making your average clone. Meaning most clones are anything but free, even though I’ve been discussing the so-called “zero storage” versions.


And as the more astute of you may have guessed, the risk of getting a 2am call of doom from Scotty goes wayyyyy up in a virtualized environment, particularly if that environment uses thin provisioning. In an environment without thin provisioning you might lose a system or two, but total melt-down? Much more likely when several systems share disk.

Let us consider for a moment. Even in a world where you had physical servers for everything, your actual disk usage rate was elusive. Some companies had it down very well, and could tell when App X was going to need more disk, others counted on alarms to tell them when they needed to fill a few more trays, most stumbled along, figuring it out as they went.

But now we’re talking about a multitude of machines, each set up like it has more disk than it needs. While those alarms will still go off and tell you that your big old NetApp is under 10% disk free, now you’ve got a lot more boxes chewing at that 10% – because with thin provisioning you could, even should, have multiple apps of varying usage patterns all hitting that same box, and with Virtualization it is so easy to “bring up another instance” that you likely have more virtual machines chewing up disk than you would have physical.

Thus does IT Armageddon start, with a warning. And soon, with a loud crashing sound. But it won’t just be one or two apps, it will be all the apps that are thin provisioned to the same hardware. And in a Virtualized environment, if you’re not careful, that can be a lot.


Thin provisioning is not the problem here, nor are VMs. Control is the problem. You can keep this nightmare scenario at bay simply by knowing what you have, how it is used, and setting alarms a bit higher for those items that have erratic disk usage patterns and all other apps on the same physical disks.

In a virtualized world – be it disk virtualized through thin provisioning or applications virtualized in a VM system – you need to know more, not less about your environment. Keep on top of it, monitoring should move up the stack, and someone should be responsible for reporting upward the state of the data center. In this case the state of data center storage, but it feeds into the state of the data center overall.

In the mad rush to virtualization, just make certain you don’t leave the core function of IT behind – to make sure that systems are running smoothly, with enough resources to do their job.

More Stories By Don MacVittie

Don MacVittie is founder of Ingrained Technology, A technical advocacy and software development consultancy. He has experience in application development, architecture, infrastructure, technical writing,DevOps, and IT management. MacVittie holds a B.S. in Computer Science from Northern Michigan University, and an M.S. in Computer Science from Nova Southeastern University.