It’s pronounced ee-cubed, not ek-yoo-bed. You know, E to the third power.
Dave gave us a thorough description of the past, current, and proposed plans for the production Exchange deployment at Microsoft. I wasn’t at all surprised to find out that Microsoft is using disk-to-disk backup for their Exchange databases, but I was more than a little surprised to find out what they’re using: good old NTBackup. NTbackup performs a streaming backup to a separate set of disk resources in the cluster. Once the backup is done, the disk resources are dismounted and remounted to one of the passive servers in the cluster. From there, the dump files are backed up to tape.
One of the problems they ran into was a throughput limit with NTBackup: 640MB per minute. This is a definite problem when you have a four-hour backup window and terabytes of data to dump to disk. The four-hour window is pretty strict in order to avoid overlapping with online maintenance and the morning user traffic. They reduced the data size by using separate backup jobs to backup each individual mailbox store database, rather than each storage group as a whole, but it wasn’t enough.
As it turns out, there are some undocumented registry parameters that affect NTBackup’s performance:
- HKCU\Software\Microsoft\Ntbackup\Backup Engine\Logical Disk Buffer Size — the default value is 32; Microsoft changed it to 64.
- HKCU\Software\Microsoft\Ntbackup\Backup Engine\Max Buffer Size — the default value is 512; Microsoft changed it to 1024.
- HKCU\Software\Microsoft\Ntbackup\Backup Engine\Max Num Tape Buffers — the default value is 9; Microsoft changed it to 16.
With these changes they were able to get ~1.2GB per minute, enough to meet the window. They arrived at these values through testing, so make sure you test in your environment to find out the best values for your configuration.
There are advantages to backing up the individual databases rather than the storage group. The main benefit is that it creates a set of smaller files that can be moved to tape from the passive server with multiple streaming jobs, making better use of the tape device pool and allowing a higher number of concurrent backup streams. Depending on the media used, you can eliminate spanning multiple media.
I’d somehow managed to miss this detail in the article on the Microsoft IT Showcase website, but it is there, along with a wealth of other detail. Give it a look if you haven’t already.
The other tidbit for the night is to check out the new version of Jetstress. The most obvious difference is that it’s now a GUI utility rather than the command-line utility we’ve all come to know and love. There’s also been a change in the recommendations for using Jetstress. Previously, Microsoft recommended that you use a test database one-twentieth to one-tenth of the size of your expected production database size. It turns out that this recommendation is likely to cause a condition known as short-stroking, which is when the disk head only moves over a small portion of the platter instead of the full distance. Short-stroking artificially reduces the head seek time which elevates your disk throughput benchmarks far past levels that you can sustain under production circumstances with fully populated databases.
Instead, you should use something much more representative of your final database size — at least 70%, preferably 100% if you have the capacity in your test lab. As always, the best test scenario is as close to production as you can get.