There are times when you have a bright idea and think “I wonder if someone has done that before”. Well earlier this week I had one of those moments!
Our remote desktop infrastructure was crawling to a halt – it has become so popular, that at times over 25% of the total user base was connected. Logon times were slow – sometimes taking upwards of 5 minutes, other times failing to load roaming profiles and just giving up. Something had to be done!
Our remote desktop infrastructure is setup as a 6-node Hyper-V cluster, with shared storage being presented as an SMB share from a scale-out fs cluster. The storage for the VDI was entirely consumer-grade SSDs, due to a very limited budget – it seemed like a good idea at the time and kept everyone happy.
The new plan was to purchase enough RAM to install in a server and run the entire VDI setup from a ramdrive. The system images were only ~80gb each and no information is actually stored on the system – all user data and settings are in roaming profiles/redirected folders from other servers.
This sounded great in theory – after moving things around, one of our nodes had enough free RAM to give this a try. Windows Server 2012 R2 has a nice feature in the iscsi server that allows you to create a ramdrive from a vhdx file:
PS C:\> New-IscsiVirtualDisk –Path ramdisk:test.vhdx –Size 80GB
I created the drive, moved one of the VDI node’s disks into it and, boom. Super speedy logons! Success!
Unfortunately, however, due to how these servers were setup in the first place, they were not based off of a ‘golden image’ parent disk – each one required that 80gb of RAM, so 6x80GB = 480GB.. I saw the pound signs! I didn’t have the free time to recreate the infrastructure from scratch using a differencing disk, so back to the drawing board.
Doing some research, I came across a product called ‘FancyCache’ – it sounded exactly what we needed, an application that creates a ramdrive that populates itself based on what information is being read from disk most often.
FancyCache has now been replaced by PrimoCache, which you can find at the following link: http://www.romexsoftware.com/en-us/primo-cache/index.html
I created a new VM to host the VHDx files for the VDI – installed PrimoCache and watched the cache populate itself as users logged in and started opening common programs. I logged in myself and saw the time-to-desktop drop to just under a minute – amazing, but not quite as good as from a pure ramdrive (obviously).
Looking through the performance counters in windows, I saw that there was an awful lot of writes happening when people logged into the system – as this is a 2012 R2 VDI, Windows seems to like setting up the store when people login and for whatever reason this writes a LOT of data. Random data. Which SSDs cope with well. However, not consumer drives with this many users writing.
Luckily PrimoCache has another feature that caught my eye – defer-write. As this cluster is battery-backed, and the data being written is mostly throwaway data (apart from stuff like updates and new program installs), I had no problem enabling the ‘Defer-Write’ option in PrimoCache and setting the time before it writes to disk at a really long time (2 hours).
This meant that PrimoCache holds the write information in memory until that time has elapsed and then writes it to disk, if it is still required. This is a really important point, as most of the user sessions last for less than 2 hours, and because of that, most of the data that was ‘written’ gets deleted when they logoff again. So not only are we writing less data to disk, we are writing sequentially, in larger chunks.
Enter ‘the hard drive’. Hard drives are excellent at sequential reads and writes. According to one ‘hybrid SAN’ manufacturer – 3x more efficient. To test this theory, I moved the entire data volume onto our normal disk-based SAN (horray for Hyper-V live data migration!).
The end result?
Our VDI infrastructure now has sub-30second logon times, thanks to PrimoCache and some good old-fashioned hard drives!
You can find PrimoCache at the following link: http://www.romexsoftware.com/en-us/primo-cache/index.html
Here’s a screenshot of it running after a couple of hours – 63% read cache hit rate seems pretty decent considering these are not differencing disks, and the amount of different programs users run on them!