Another solution for Autodiscover 401 woes in #MSExchange

Earlier tonight, I was helping a customer troubleshoot why users in their mixed Exchange 2013/2007 organization were getting 401 errors when trying to use Autodiscover to set up profiles. Well, more accurately, the Remote Connectivity Analyzer was getting a 401, and users were getting repeating authentication prompts. However, when we tested internally against the Autodiscover endpoints everything worked fine, and manual testing externally against the Autodiscover endpoint also worked.

So why did our manual tests work when the automated tests and Outlook didn’t?

Well, some will tell you it’s because of bad NTFS permissions on the virtual directory, while others will say it’s because of the loopback check being disabled. And in your case, that might in fact be the cause…but it wasn’t in mine.

In my case, the clue was in the Outlook authentication prompt (users and domains have been changed to protect the innocent):



I’m attempting to authenticate with the user’s UPN, and it’s failing…hey.

Re-run the Exchange Remote Connectivity analyzer, this time with the Domain\Username syntax, and suddenly I pass the Autodiscover test. Time to go view the user account – and sure enough, the account’s UPN is not set to the primary SMTP address.

Moral of the story: check your UPNs.

Upgrade Windows 2003 crypto in #MSExchange migrations

Just had this bite me at one of my customers. Situation: Exchange Server 2007 on Windows Server 2003 R2, upgrading to Exchange Server 2013 on Windows Server 2012. We ordered a new SAN certificate from GoDaddy (requesting it from Exchange 2013) and installed it on the Exchange 2013 servers with no problems. When we installed it on the Exchange 2007 servers, however, the certificates would import but the new certificates (and its chain) all showed the dreaded red X.

Looking at the certificate, we saw the following error message:



If you look more closely at the certificates in GoDaddy’s G2 root chain, you’ll see it’s signed both in SHA1 and SHA2-256. And the latter is the problem for Windows Server 2003 – it has an older cryptography library that doesn’t handle the newer cypher algorithms.

The solution: Install KB968730 on your Windows Server 2003 machines, reboot, and re-check your certificate. Now you should see the “This certificate is OK” message we all love.

Load Balancing ADFS on Windows 2012 R2

Greetings, everyone! I ran across this issue recently with a customer’s Exchange Server 2007 to Office 365 migration and wanted to pass along the lessons learned.

The Plan

It all started so innocently: the customer was going to deploy two Exchange Server 2013 hybrid servers into their existing Exchange Server 2007 organization for a Hybrid organization using directory synchronization and SSO with ADFS. They’ve been investing a lot of work into upgrading their infrastructure and have been upgrading systems to newer versions of Windows, including some spiffy new Windows Server 2012 Hyper-V servers. We decided that we’d deploy all of the new servers on Windows Server 2012 R2, the better to future-proof them. We were also going to use Windows NLB for the ADFS and ADFS proxy servers instead of using their existing F5 BIG-IP load balancer, as the network team is in the middle of their own projects.

The Problem

There were actually two problems. The first, of course, was the combination of Hyper-V and Windows NLB. Unicast was obviously no good, multicast has its issues, and because we needed to get the servers up and running as fast as possible we didn’t have time to explore using IGMP with Multicast. Time to turn to the F5. The BIG-IP platform is pretty complex and full of features, but F5 is usually good about documentation. Sure enough, the F5 ADFS 2.0 deployment guide (Deploying F5 with Microsoft Active Directory Federation Services) got us most of the way there. If we had been deploying ADFS  2.0 on Server 2012 and the ADFS proxy role, I’d have been home free.

In Windows 2012 R2 ADFS, you don’t have the ADFS proxy role any more – you use the Web Application Proxy (WAP) role service component of the Remote Access role. However, that’s not the only change. If you follow this guide with Windows Server 2012 R2, your ADFS and WAP pools will fail their health checks (F5 calls them monitors) and the virtual server will not be brought online because the F5 will mistakenly believe that your pool servers are down. OOPS!

The Resolution

So what’s different and how do we fix it?

ADFS on Windows Server 2012 R2 is still mostly ADFS 2.0, but some things have been changed – out with the ADFS proxy role, in with the WAP role service. That’s the most obvious change, but the real sticker here is under the hood in the guts of the Windows Server 2012 R2 HTTP server. In Windows Server 2012 R2, IIS and the Web server engine has a new architecture that supports the SNI extension to TLS. SNI is insanely cool. The connecting machine tells it what host name it’s trying to connect to as part of the HTTPS session setup so that one IP address can be used host multiple HTTPS sites with different certificates, just like HTTP 1.1 added the Hosts: header to HTTP.

But the fact that Windows 2012 R2 uses SNI gets in the way of the HTTPS health checks that the F5 ADFS 2.0 deployment guide has you configure. We were able to work around it by replacing the HTTPS health checks with TCP Half Open checks, which connect to the pool servers on the target TCP port and wait for the ACK. If they receive it, the server is marked up.

For long-term use, the HTTPS health checks are better; they allow you to configure the health check to probe a specific URL and get a specific response back before it declares a server in the pool is healthy. This is better than ICMP or TCP checks which only check for ping responses or TCP port responses. It’s totally possible for a machine to be up on the network and IIS answering connections but something is misconfigured with WAP or ADFS so it’s not actually a viable service. Good health checks save debugging time.

The Real Fix

As far as I know there’s no easy, supported way to turn SNI off, nor would I really want to; it’s a great standard that really needs to be widely deployed and supported because it will help servers conserve IP addresses and allow them to deploy multiple HTTPS sites on fewer IP/port combinations while using multiple certificates instead of big heavy SAN certificates. Ultimately, load balancer vendors and clients need to get SNI-aware fixes out for their gear.

If you’re an F5 user, the right way is to read and follow this F5 DevCentral blog post Big-IP and ADFS Part 5 – “Working with ADFS 3.0 and SNI” to configure your BIG-IP device with a new SNI-aware monitor; you’re going to want it for all of the Windows Server 2012 R2 Web servers you deploy over the next several years. This process is a little convoluted – you have to upload a script to the F5 and pass in custom parameters, which just seems really wrong (but is a true measure of just how powerful and beastly these machines really are) – but at the end of the day, you have a properly configured monitor that not only supports SNI connections to the correct hostname, but uses the specific URI to ensure that the ADFS federation XML is returned by your servers.

An SNI-aware F5 monitor (from DevCentral)

What do you do if you don’t have an F5 load balancer and your vendor doesn’t support F5? Remember when I said that there’s no way to turn SNI off? That’s not totally true. You can go mess with the SNI configuration and change the SSL bindings in a way that seems to mimic the old behavior. You run the risk of really messing things up, though. What you can do is follow the process in this TechNet blog post How to support non-SNI capable Clients with Web Application Proxy and AD FS 2012 R2.



As a side note, almost everyone seems to be calling the ADFS flavor on Windows Server 2012 R2 “ADFS 3.0.” Everyone, that is, except for Microsoft. It’s not a 3.0; as I understand it the biggest differences have to do with the underlying server architecture, not the ADFS functionality on top of it per se. So don’t call it that, but recognize most other people will. It’s just AD FS 2012 R2.

Why Virtualization Still Isn’t Mature

As a long-time former advocate for Exchange virtualization (and virtualization in general), it makes me glad to see other pros pointing out the same conclusions I reached a while ago about the merits of Exchange virtualization. In general, it’s not a matter of whether you can solve the technological problems; I’ve spent years proving for customer after customer that you can. Tony does a great job of talking about the specific mismatch between Exchange and virtualization. I agree with everything he said, but I’m going to go one further and say that part of the problem is that virtualization is still an immature technology.

Now when I say that, you have to understand: I believe that virtualization is more than just the technology you use to run virtual machines. It includes the entire stack. And obviously, lots of people agree with me, because the core of private cloud technology is creating an entire stack of technology to wrap around your virtualization solution, such as Microsoft System Center or OpenStack. These solutions include software defined networking, operating system configuration, dynamic resource management, policy-driven allocation, and more. There are APIs, automation technologies, de facto standards, and interoperability technologies. The goal is to reduce or remove the amount of human effort required to deploy virtual solutions by bringing every piece of the virtualization pie under central control. Configure policies and templates and let automation use those to guide the creation and configuration of your specific instances, so that everything is consistent.

But there’s a missing piece – a huge one – one that I’ve been saying for years. And that’s the application layer. When you come right down to it, the Exchange community gets into brawls with the virtualization community (and the networking community, and the storage community, but let’s stay focused on one brawl at a time please) because there are two different and incompatible principles at play:

  • Exchange is trying to be as aware of your data as possible and take every measure to keep it safe, secure, and available by making specific assumption about how the system is deployed and configured.
  • Your virtualization product is trying to treat all applications (including Exchange) as if they are completely unaware of the virtualization stack and provide features and functionality whether they were designed for it or not.

The various stack solutions are using the right approach, but I believe they are doing it in the wrong direction; they work great in the second scenario, but they create exceptions and oddities for Exchange and other programs like Exchange that fit the first scenario. So what’s missing? How do I think virtualization stacks need to fix this problem?

Create a standard by which Exchange and other applications can describe what capabilities they offer and define the dependencies and requirements for those capabilities that must in turn be provided by the stack. Only by doing this can policy-driven private cloud solutions close that gap and make policies extend across the entire stack, continuing to reduce the change for human error.

With a standard like this, virtualizing Exchange would become a lot easier. As an example, consider VM to host affinity. Instead of admins having to remember to manually configure Exchange virtual DAG members to not be on the same host, Exchange itself would report  this requirement to the virtualization solution. DAG Mailbox servers would never be on the same host, and the FSW wouldn’t be on the same host as any of the Mailbox servers. And when host outages resulted in the loss of redundant hosts, the virtualization solution could throw an event caught by the monitoring system that explained the problem before you got into a situation where this constraint was broken. But don’t stop there. This same standard could be applied to network configuration, allowing Exchange and other applications to have load balancing automatically provisioned by the private cloud solution.  Or imagine deploying Exchange mailbox servers into a VMware environment that’s currently using NFS. The minute the Mailbox role is deployed, the automation carves off the appropriate disk blocks and presents them as iSCSI to the new VM (either directly or through the hypervisor as an RDM, based on the policy) so that the storage meets Exchange’s requirements.

Imagine the arguments that could solve. Instead of creating problems, applications and virtualization/private cloud stacks would be working together — a very model of maturity.

Windows 2012 R2 and #MSExchange: not so fast

Updated 9/18/2014: As of this writing, Windows Server 2012 R2 domain controllers are supported against all supported Microsoft Exchange environments:

  • Exchange Server 2013 with CU3 or later (remember, CU5 and CU6 are the two versions currently in support; SP1 is effectively CU4)
  • Exchange Server 2010 with SP3 and RU5 or later
  • Exchange Server 2007 with SP3 and RU13 or later

Take particular note that Exchange Server 2010 with SP2 (any rollup) and earlier are NOT supported with Windows Server 2012 R2 domain controllers.

Also note that if you want to enabled Windows Server 2012 R2 domain and forest functional level, you must have Exchange Server 2013 SP1 or later OR Exchange Server 2010 + SP3 + RU5 or later. Exchange Server 2013 CU3 and Exchange Server 2007 (any level) are not supported for these levels.


In the past couple of months since Windows Server 2012 R2 has dropped, a few of my customers have asked about rolling out new domain controllers on this version – in part because they’re using it for other services and they want to standardize their new build outs as much as they can.

My answer right now? Not yet.

Whenever I get a compatibility question like this, the first place I go is the Exchange Server Supportability Matrix on TechNet. Now, don’t let the relatively old “last update” time dismay you; the support matrix is generally only updated when major updates to Exchange (a service pack or new version) come out. (In case you haven’t noticed, Update Rollups don’t change the base compatibility requirements.)

Not this kind of matrix...

Not that kind of matrix…

If we look on the matrix under the Supported Active Directory Environments heading, we’ll see that as of right now Windows Server 2012 R2 isn’t even on the list! So what does this tell us? The same thing I tell my kids instead of the crappy old “No means No” chestnut: only Yes means Yes. Unless the particular combination you’re looking for is listed, then the answer is that it’s not supported at this time.

I’ve confirmed this by talking to a few folks at Microsoft – at this time, the Exchange requirements and pre-requisites have not changed. Are they expected to? No official word, but I suspect if there is a change we’ll see it when Exchange 2013 SP1 is released; that seems a likely time given they’ve already told us that’s when we can install Exchange 2013 on Windows 2012 R2.

In the meantime, if you have Exchange, hold off from putting Windows 2012 R2 domain controllers in place. Will they work? Probably, but you’re talking about untested schema updates and an untested set of domain controllers against a very heavy consumer of Active Directory. I can’t think of any compelling reasons to rush this one.

Finding Differences in Exchange objects (#DoExTip)

Many times, when I’m troubleshooting Exchange issues I need to compare objects (such as user accounts in Active Directory, or mailboxes) to figure out why there is a difference in behavior. Many times, the difference is tiny and hard-to-spot. It may not even be visible through the GUI.

To do this, I first dump the objects to separate text files. How I do this depends on the type of object I need to compare. If I can output the object using Exchange Management Shell, I pipe it through Format-List and dump it to text there:

Get-Mailbox –Identity Devin | fl > Mailbox1.txt

If it’s a raw Active Directory object I need, I use the built-in Windows LDP tool and copy and paste the text dump to separate files in a text editor.

Once the objects are in text file format, I use a text comparison tool, such as the built-in comparison tool in my preferred text editor (UltraEdit) or the standalone tool WinDiff.The key here is to quickly highlight the differences. Many of those differences aren’t important (metadata such as last time updated, etc.) but I can spend my time quickly looking over the properties that are different, rather than brute-force comparing everything about the different objects.

I can hear many of you suggesting other ways of doing this:

  • Why are you using text outputs even in PowerShell? Why not export to XML or CSV?
    If I dump to text, PowerShell displays the values of multi-value properties and other property types that it doesn’t show if I export the object to XML or CSV. This is very annoying, as the missing values are typically the source of the key difference. Also, text files are easy for my customers to generate, bundle, and email to me without any worries that virus scanners or other security policies might intercept them.
  • Why do you run PowerShell cmdlets through Format-List?
    To make sure I have a single property per line of text file. This helps ensure that the text file runs through WinDiff properly.
  • Why do you run Active Directory dumps through LDP?
    Because LDP will dump practically any LDAP property and value as raw text as I access a given object in Active Directory. I can easily walk a customer through using LDP and pasting the results into Notepad while browsing to the objects graphically, as per ADSIedit. There are command line tools that will export in other formats such as LDIF, but those are typically overkill and harder to use while browsing for what you need (you typically have to specify object DNs).
  • PowerShell has a Compare-Object cmdlet. Why don’t you use that for comparisons instead of WinDiff or text editors?
    First, it only works for PowerShell objects, and I want a consistent technique I can use for anything I can dump to text in a regular format. Second, Compare-Object changes its output depending on the object format you’re comparing, potentially making the comparison useless. Third, while Compare-Object is wildly powerful because it can hook into the full PowerShell toolset (sorting, filters, etc.) this complexity can eat up a lot of time fine-tuning your command when the whole point is to save time. Fourth, WinDiff output is easy to show customers. For all of these reasons, WinDiff is good enough.

Using Out-GridView (#DoExTip)

My second tip in this series is going to violate the ground rules I laid out for it, because they’re my rules and I want to. This tip isn’t a tool or script. It’s a pointer to an insanely awesome feature of Windows PowerShell that just happens to nicely solve many problems an Exchange administrator runs across on a day-to-day basis.

I only found out about Out-GridView two days ago, the day that Tony Redmond’s Windows IT Pro post about the loss of the Message Tracking tool hit the Internet. A Twitter conversation started up, and UK Exchange MCM Brian Reid quickly chimed in with a link to a post from his blog introducing us to using the Out-GridView control with the message tracking cmdlets in Exchange Management Shell.

This is a feature introduced in PowerShell 2.0, so Exchange 2007 admins won’t have it available. What it does is simple: take a collection of objects (such as message tracking results, mailboxes, public folders — the output of any Get-* cmdlet, really) and display it in a GUI gridview control. You can sort, filter, and otherwise manipulate the data in-place without having to export it to CSV and get it to a machine with Excel. Brian’s post walks you through the basics.

In just two days, I’ve already started changing how I interact with EMS. There are a few things I’ve learned from Get-Help Out-GridView:

  • On PowerShell 2.0 systems, Out-GridView is the endpoint of the pipeline. However, if you’re running it on a system with PowerShell 3.0 installed (Windows Server 2012), Out-GridView can be used to interactively filter down a set of data and then pass it on in the pipeline to other commands. Think about being able to grab a set of mailboxes, fine-tune the selection, and pass them on to make modifications without having to get all the filtering syntax correct in PowerShell.
  • Out-GridView is part of the PowerShell ISE component, so it isn’t present if you don’t have ISE installed or are running on Server Core. Exchange can’t run on Server Core, but if you want to use this make sure the ISE feature is installed.
  • Out-GridView allows you to select and copy data from the gridview control. You can then paste it directly into Excel, a text editor, or some other program.

This is a seriously cool and useful tip. Thanks, Brian!

Exchange Environment Report script (#DoExTip)

My inaugural DoExTip is a script I have been rocking out to and enthusiastically recommending to customers for over a year: the fantastic Exchange Environment Report script by UK Exchange MVP Steve Goodman. Apparently Microsoft agrees, because they highlight it in the TechNet Gallery.

It’s a simple script: run it and you get a single-page HTML report that gives you a thumbnail overview of your servers and databases, whether standalone or DAG. It’s no substitute for monitoring, but as a regular status update posted to a web page or emailed to a group (easily done from within the script) it’s a great touch point for your organization. Run it as a scheduled task and you’ll always have the 50,000 foot view of your Exchange health.

I’ve used it for migrations in a variety of organizations, from Exchange 2003 (it must be run on Exchange 2007 or higher) on up. I now consider this script an essential part of my Exchange toolkit.

Introducing DoExTips

At my house, we try to live our life by a well-known saying attributed to French philosopher Voltaire: “The perfect is the enemy of the good.” This is a translation from the second line of his French poem La Bégueule, which itself is quoting a more ancient Italian proverb. It’s a common idea that perfection is a trap. You may be more used to modern restatements such as the 80/20 rule (the last 20% of the work takes 80% of the effort).

I’ve had an idea for several years to fill what I see is a gap in the Exchange community. I’ve been toying with this idea for a while, trying to figure out the perfect way to do it. Today, I had a Voltaire moment: forget perfect.

So, without further ado, welcome to Devin on Exchange Tips (or #DoExTips for short). These are intended to be small posts that occur frequently, highlighting free scripts and tools that members of the global Exchange community have written and made available. There’s a lot of good stuff out there, and it doesn’t all come from Microsoft, and you don’t have to pay for it.

The tools and scripts I’ll highlight in DoExTips are not going to be finished products or polished. In many cases, they’ll take work to adapt to your environment. I’m going to quickly show you something I found that I’ve used as a starting point or spring board, not solve all your problems.

So, if you’ve got something you think should be highlighted as a DoExTip, let me know. (Don’t like the name? Blame Tom Clancy. I’ve been re-reading his Jack Ryan techno-thrillers and so military naming is on the brain.)

#MSExchange 2010 and .NET 4.0

Oh, Microsoft. By now, one might think that you’d learn not to push updates to systems without testing them thoroughly. One would be wrong. At least this one classifies as a minor annoyance and not outright breakage…

Windows Update offers up .NET 4.0 to Windows 2008 R2 systems as an Important update (and has been for a while). This is fine and good – various versions of the .NET framework can live in parallel. The problem, however, comes when you accept this update on an Exchange 2010 server with the CAS role.

If you do this, you may notice that the /exchange, /exchweb, and /public virtual directories (legacy directories tied to the /owa virtual directory) suddenly aren’t redirecting to /owa like they’re supposed to. Now, people aren’t normally using these directories in their OWA URLs anymore, but if someone does attempt to hit one of these virtual directories it leaves a gnarly error message to spam your event logs.

This is occurring because when .NET 4.0 is installed and the ASP.NET 4.0 components are tied into IIS, the Default Application Pool is reconfigured to use ASP.NET 4.0 instead of ASP.NET 2.0 (the version used by the .NET 3.5 runtime on Windows 2008 R2). What exactly it is about this that breaks these legacy virtual directories, I have no idea, but break them it does.

The fix for this is relatively simple: uninstall .NET 4.0 and hide the update from the machine so it doesn’t come back. If you don’t want to do that, follow this process outlined in TechNet to reset the Default Application Pool back to .NET 2.0. Be sure to run IISRESET afterwards.

Attached To You: Exchange 2010 Storage Essays, part 3

[2100 PST 11/5/2012: Edited to fix some typos and missing words/sentences.]

So, um…I knew it was going to take me a while to write this third part of the Exchange 2010 storage saga…but over two years? Damn, guys. I don’t even know what to say, other than to get to it.

So, we’ve this lovely little 3-dimension storage axis I’ve been talking about in parts 1 (JBOD vs. RAID) and 2 (SATA vs. SAS/FC). Part 3 addresses the third axis: SAN vs. DAS.

Exchange Storage DAS vs. SAN

What’s in a name?

It used to be that everyone agreed on the distinction between DAS, NAS, and SAN:

  • DAS was typically dumb or entry-level storage arrays that connected to a single (or at most two or three) servers via SCSI, SATA/SAS, or some other storage-specific cabling/protocol. DAS arrays typically had very little on-board smarts, other than the ability to run RAID configurations and present the RAID volumes to the connected server as if they were a single volume instead.
  • NAS was file-level storage presented over a network connection to servers. The two common protocols used were NFS (for Unix machines) and SMB/CIFS (for Windows machines). NAS solutions often include more functionality, including features such as direct interfaces with backup solutions, snapshots of the data volumes, replication of data to other units, and dynamic addition of storage.
  • SAN was high-end, expensive block-level storage presented over a separate network infrastructure such as FC or iSCSI over Ethernet. SAN systems offer even more features aimed at enterprise markets, including sophisticated disk partitioning and access mechanisms designed to achieve incredibly high levels of concurrence and performance.

As time passed and most vendors figured out that providing support for both file-level and block-level protocols made their systems more attractive by allowing them to be reconfigured and repurposed by their customers, the distinction between NAS and SAN began to blur. DAS, however, was definitely dumb storage. Heck, if you wanted to share it with multiple systems, you had to have multiple physical connections! (Anyone other than me remember those lovely days of using SCSI DAS arrays for poor man’s clustering by connecting two SCSI hosts – one with a non-default host ID – to the same SCSI chain?)

At any rate, it was all good. For Exchange 2003 and early Exchange 2007 deployments, storage vendors were happy because if you had more than a few hundred users, you almost certainly needed a NAS/SAN solution to consolidate the number of spindles required to meet your IOPS targets.

The heck you say!

In the middle of the Exchange 2007 era, Microsoft upset the applecart. It turns out that with the ongoing trend of larger mailboxes, Exchange 2007 SP1, CCR, and SCR, many customers were able to do something pretty cool: decrease the mailbox/database density to the point where (with Exchange 2007’s reduced IOPS) the total IOPS for their databases no longer required a sophisticated storage solution to provide the requisite IOPS. In general, disks for SAN/NAS units have to be of a higher quality and speed than for DAS arrays, so they typically had better performance and lower capacity than consumer-grade drives.

This trend only got more noticeable and deliberate in Exchange 2010, when Microsoft unified CCR and SCR into the DAG and moved replication to the application layer (as we discussed in Part 1). Microsoft specifically designed Exchange 2010 to be deployable on a direct-attached RAID-less 2TB SATA 7200 RPM drive to hold a database and log files, so they could scale hosted Exchange deployments up in an affordable fashion. Suddenly, Exchange no longer needed SAN/NAS units for most deployments – as long as you had sufficiently large mailboxes throughout your databases to reduce the IOPS/database ratio below the required amount.

Needless to say, storage vendors have taken this about as light-heartedly as a coronary.

How many of you have heard in the past couple of years the message that “SAN and DAS are the same thing, just different protocols”?

Taken literally, DAS and SAN are only differences in connectivity.

The previous quote is from EMC, but I’ve heard the same thing from NetApp and other SAN vendors. Ever notice how it’s only the SAN vendors who are saying this?

I call shenanigans.

If they were the same thing, storage vendors wouldn’t be spending so much money on whitepapers and marketing to try to convince Exchange admins (more accurately, their managers) that there was really no difference and that the TCO of a SAN just happens to be a better bet.

What SAN vendors now push are features like replication, thin provisioning, virtualization and DR integration, backup and recovery – not to mention the traditional benefits of storage consolidation and centralized management. Here’s the catch, though. From my own experience, their models only work IF and ONLY IF you continue to deploy Exchange 2010 the same way you deployed Exchange 2003 and Exchange 2007:

  • deploying small mailboxes that concentrate IOPS in the same mailbox database
  • grouping mailboxes based on criteria meant to maximize single instance storage (SIS)
  • planning Exchange deployments around existing SAN features and backup strategies
  • relying on third-party functionality for HA and DR
  • deploying Exchange 2010 DAGs as if they were a shared copy cluster

When it comes right down to it, both SAN and DAS deployments are technically (and financially) feasible solutions for Exchange deployments, as long as you know exactly what your requirements are and let your requirements drive your choice of technology. I’ve had too many customers who started with the technology and insisted that they had to use that specific solution. Inevitably, by designing around technological elements, you either have to compromise requirements or spend unnecessary energy, time, and money solving unexpected complications.

So if both technologies are viable solutions, what factors should you consider to help decide between DAS and SAN?

Storage Complexity

You’ve probably heard a lot of other Exchange architects and pros talk about complexity – especially if they’re also Certified Masters. There’s a reason for this – more complex systems, all else being equal, are more prone to system outages and support calls. So why do so many Exchange “pros” insist on putting complexity into the storage design for their Exchange systems when they don’t even know what that complexity is getting them? Yes, that’s right, Exchange has millennia of man-hours poured into optimizing and testing the storage system so that your critical data is safe under almost all conditions, and then you go and design storage systems that increase the odds the fsck-up fairy[1] will come dance with your data in the pale moonlight.

SANs add complexity. They add more system components and drivers, extra bits of configuration, and additional systems with their own operating system, firmware, and maintenance requirements. I’ll pick on NetApp for a moment because I’m most familiar with their systems, but the rest of the vendors have their own stories that hit most of the same high points:

  • I have to pick either iSCSI or FC and configure the appropriate HBA/NICs plus infrastructure, plus drivers and firmware. If I’m using FC I get expensive FC HBAs and switches to manage. If I go with iSCSI I get additional GB or 10GB Ethernet interfaces in my Exchange servers and the joy of managing yet another isolated set of network adapters and making sure Exchange doesn’t perform DAG replication over them.
  • I have to install the NetApp Storage Tools.
  • I have to install the appropriate MPIO driver.
  • I have to install the SnapDrive service, because if I don’t, the NetApp snapshot capability won’t interface with Windows VSS, and if I’m doing software VSS why the hell am I even using a SAN?
  • I *should* install SnapManager for Exchange (although I don’t have to) so that my hardware VSS backups happen and I can use it as an interface to the rest of the NetApp protection products and offerings.
  • I need to make sure my NetApp guy has the storage controllers installed and configured. Did I want redundancy on the NetApp controller? Upgrades get to be fun and I have to coordinate all of that to make sure they don’t cause system outage. I get to have lovely arguments with the NetApp storage guys about why they can’t just treat my LUNs the same way they treat the rest of them, yes I need my own aggregates and volumes and no please don’t give me the really expensive 15KRPM SAS drives that store a thimble because you’re going to make your storage guys pass out when they find out how many you need for all those LUNs and volumes (x2 because of your redundant DAG copies).[2]

Here’s the simple truth: SANs can be very reliable and stable. SANs can also be a single point of failure, because they are wicked expensive and SAN administrators and managers get put out with Exchange administrators who insist on daft restrictions like “give Exchange dedicated spindles” and “don’t put multiple copies of the same database on the same controller” and other party-pooping ways to make their imagined cost savings dwindle away to nothing. The SAN people have their own deployment best practices, just like Exchange people; those practices are designed to consolidate data for applications that don’t manage redundancy or availability on their own.

Every SAN I’ve ever worked with wants to treat all data the same way, so to make it reliable for Exchange you’re going to need to rock boats. This means more complexity (and money) and the SAN people don’t want complexity in their domain any more than you want it in yours. Unless you know exactly what benefits your solution will give you (and I’m not talking general marketing spew, I’m talking specific, realistic, quantified benefits), why in the world would you want to add complexity to your environment, especially if it’s going to start a rumble between the Exchange team and the SAN team that not even Jackie Chain and a hovercraft can fix?

Centralization and Silos

Over the past several years, IT pros and executives have heard a lot of talk about centralization. The argument for centralization is that instead of having “silos” or autonomous groups spread out, all doing the same types of things and repeating effort, you reorganize your operation so that all the storage stuff is handled by a single group, all the network stuff is handled by another group, and so on and so forth. This is another one of those principles and ideas that sounds great in theory, but can fall down in so many ways once you try to put it into practice.

The big flaw I’ve seen in most centralization efforts is that they end up creating artificial dependencies and decrease overall service availability. Exchange already has a number of dependencies that you can’t do anything about, such as Active Directory, networking, and other external systems. It is not wise to create even more dependencies when the Exchange staff doesn’t have the authority to deal with the problems those dependencies create but are still on the hook for them because the new SLAs look just like the old SLAs from the pro-silo regime.

Look, I understand that you need to realign your strategic initiatives to fully realize your operational synergies, but you can’t go do it half-assed, especially when you’re messing with business critical utility systems like corporate email. Deciding that you’re going to arbitrarily rearrange operations patterns without making sure those patterns match your actual business and operational requirements is not a recipe for long-term success.

Again, centralization is not automatically incompatible with Exchange. Doing it correctly, though, requires communication, coordination, and cross-training. It requires careful attention to business requirements, technical limitations, and operational procedures – and making sure all of these elements align. You can’t have a realistic 1-hour SLA for Exchange services when one of the potential causes for failure itself has a 4-hour SLA (and yes, I’ve seen this; holding Exchange metrics hostage to a virtualization group that has incompatible and competing priorities and SLAs makes nobody happy). If Exchange is critical to your organization, pulling the Exchange dependencies out of the central pool and back to where your Exchange team can directly operate on and fix them may be a better answer for your organization’s needs.

The centralization/silo debate is really just capitalism vs. socialism; strict capitalism makes nobody happy except hardcore libertarians, and strict socialism pulls the entire system down to the least common denominator[3]. The real answer is a blend and compromise of both principles, each where they make sense. In your organization, DAS and an Exchange silo just may better fit your business needs.

Management and Monitoring

In most Exchange deployments I’ve seen, this is the one area I consistently see neglected, so it doesn’t surprise me that it’s not more of an issue. Exchange 2010 does a lot to make sure the system stays up and operational, but it can’t manage everything. You need to have a good monitoring system in place and you need to have automation or well-written, thorough processes to handle dealing with common warnings and low-level errors.

One of the advantages of a SAN is that (at least on a storage level) much of this will be taken care of you. Every SAN system I’ve worked with not only built-in monitoring of state of the disks and the storage hardware, but has extensive integration with external monitoring systems. It’s really nice when at the same time you get notification that you’ve had a disk failure in the SAN that the SAN vendor has also been notified, so you know in the next day a spare will show up via FedEx (or even possibly brought by a technician who will replace it for you). This kind of service is not normally associated with DAS arrays.

However, even the SAN’s luscious – nay, sybaritic – level of notification luxury only protects you against SAN-level failures. SAN monitoring doesn’t know anything about Exchange 2010 database copy status or DAG cluster issues or Windows networking or RPC latency or CAS arrays or load balancer errors. Whether you deploy Exchange 2010 on a SAN or DAS offering, you need to have a monitoring solution that provides this kind of end-to-end view of your system. Low-end applications that rely on system-agnostic IP pings and protocol endpoint probes are better than nothing, but they aren’t a substitute for application-aware systems such as Microsoft System Center Operations Manager or some other equivalent that understand all of the components in an Exchange DAG and queries them all for you.

You also need to think about your management software and processes. Many environments don’t like having changes made to centralized, critical dependency systems like a SAN without going through a well-defined (and relatively lengthy) change management process. In these environments, I have found it difficult to get emergency disk allocations pushed through in a timely fashion.

Why would we need emergency disk allocations in an Exchange 2010 system? Let me give you a few real examples:

  • Exchange-integrated applications[4] cause database-level corruption that drives server I/O and RPC latency up to levels that affect other users.
  • Disk-level firmware errors cause disk failure or drops in data transfer rates. Start doing wide-scale disk replacements on a SAN and you’re going to drive system utilization through the roof because of all the RAID group rebuilds going on. Be careful which disks you pull at one time, too – don’t want to pull two or three disks out of the same RAID group and have the entire thing drop offline.
  • Somebody else’s application starts having disk problems. You have to move the upper management’s mailboxes to new databases on unaffected disks until the problems are identified and resolved.
  • A routine maintenance operation on one SAN controller goes awry, taking out half of the database copies. There’s a SAN controller with some spare capacity, but databases need to be temporarily consolidated so there is enough room for two copies of all the databases during the repair on the original controller.

Needless to say, with DAS arrays, you don’t have to tailor your purchasing, management, and operations of Exchange storage around other applications. Yes, DAS arrays have failures too, but managing them can be simpler when the Exchange team is responsible for operations end-to-end.

Backup, Replication, and Resilience

The big question for you is this: what protection and resilience strategy do you want to follow? A lot of organizations are just going on auto-pilot and using backups for Exchange 2010 because that’s how they’ve always done it. But do you really, actually need them?

No, seriously, you need to think about this.

Why do you keep backups for Exchange? If you don’t have a compelling technical reason, find the people who are responsible for the business reason and ask them what they really care about – is it having tapes or a specific technology, or is it the ability to recover information within a specific time window? If it’s the latter, then you need to take a hard look at the Exchange 2010 native data protection regime:

  • At least three database copies
  • Increased deleted item/deleted mailbox recovery limits
  • Recoverable items and hold policies
  • Personal archives and message retention
  • Lagged database copies

If this combination of functionality meets your needs, you need to take a serious look at a DAS solution. A SAN solution is going to be a lot more expensive for the storage options to begin with, and it’s going to be even more expensive for more than two copies. None of my customers deployed more than two copies on a SAN, because not only did they have to budget for the increased per-disk cost, but they would have to deploy additional controllers and shelves to add the appropriate capacity and redundancy. Otherwise, they’d have had multiple copies on the same hardware, which really defeats the purpose. At that point, DAS becomes rather attractive when you start to tally up the true costs of the native data protection solution.

So what do you do if the native data protection isn’t right for you and you need traditional backups? In my experience, one of the most compelling reasons for deploying Exchange on a SAN is the fantastic backup and recovery experience you get. In particular, NetApp’s snapshot-based architecture and SME backup application head the top of my list. SME includes a specially licensed version of the Ontrack PowerControls utility to permit single mailbox recovery, all tied back into NetApp’s kick-ass snapshots. Plus, the backups happen more quickly because the VSS provider is the NetApp hardware, not a software driver in the NTFS file system stack, and you can run the ESE verification off of a separate SME server to offload CPU from the mailbox servers. Other SAN vendors offer some sort of integrated backup option of some equivalency.

The only way you’re going to get close to that via DAS is if you deploy Data Protection Manager. And honestly, if you’re still replying on tape (or cloud) backups, I really recommend that you use something like DPM to stage everything to disk first so that backups from your production servers are staging to a fast disk system. Get those VSS locks dealt with as quickly as possible and offload the ESE checks to the DPM system. Then, do your tape backups off of the DPM server and your backup windows are no longer coupled to your user-facing Exchange servers. That doesn’t even mention DPM’s 15-minute log synchronization and use of deltas to minimize storage space on its own storage pool. DPM has a lot going for it.

A lot of SANs do offer synchronous and asynchronous replication options, often at the block level. These sound like good options, especially to enhance site resiliency, and for other applications, they often can be. Don’t get suckered into using them for Exchange, though, unless they are certified to work against Exchange (and if it’s asynchronous replication, it won’t be). A DAS solution doesn’t offer this functionality, but that’s no loss in this column; whether you’re on SAN or DAS, you should be replicating via Exchange. Replicating using the SAN block-level replication means that the replication is happening without Exchange being aware of it, which means depending on when a failure happens, you could in the worst case end up with a corrupted database replica volume. Best case, your SAN-replicated database will not be in a consistent state, so you will have to run ESEUTIL to perform a consistency check and play log files forward before mounting that copy. If you’re going to that, why are you running Exchange 2010?

Now if you need a synchronous replication option, Exchange 2010 includes an API to allow a third-party provider to replace the native continuous replication capability. As far as I know, only one SAN vendor (EMC) has taken advantage of this option, so your options are pretty clear in this scenario.


We’ve covered a lot of ground in this post, so if you’re looking for a quick take-away, the answer is this:

Determine what your real requirements are, and pick your technology accordingly. Whenever possible, don’t make choices by technology or cost first without having a clear and detailed list of expected benefits in hand. You will typically find some requirement that makes your direction clear.

If anyone tells you that there’s a single right way to do it, they’re probably wrong. Having said that, though, the more I’ve seen over the past couple of years, the more people deviate from the Microsoft sweet spot, the more design compromises they’ve made when perhaps they didn’t have to. Inertia and legacy have their place but need to be balanced with innovation and reinvention.

[1] Not a typo, I’m just showing off my Unix roots. The fsck utility (file system check) helps fix inconsistencies in the Unix file systems. Think chkdsk.

[2] Can you tell I’ve been in this rodeo once or twice? But I’m not bitter. And I do love NetApp because of SME, I just realize it’s not the right answer for everyone.

[3] Yes, I did in fact just go there. Blame it on the nearly two years of political crap we’ve suffered in the U.S. for this election season. November 6th can’t come soon enough.

[4] The application in this instance was an older version of Microsoft Dynamics CRM, very behind on its patches. There was a nasty calendar corruption bug that made my customer’s life hell for a while. The solution was to upgrade CRM to the right patch level, then move all of the affected mailboxes (about 40% of the users) to new databases. We didn’t need to have a lot of new databases, as we could move them in a swing fashion, but in order to get it done in a timely fashion we needed to provision enough LUNs to have enough databases and copies that we could get the process done in a timely fashion. Each swing cycle took about two weeks because of change management when we could have gotten it done much sooner.

Can You Fix This PF Problem?

Today I got to chat with a colleague who was trying to troubleshoot a weird Exchange public folder replication problem. The environment, which is the middle of an Exchange 2007 to Exchange 2010 migration, uses public folders heavily – many hundreds of top-level public folders with a lot of sub-folders. Many of these public folders are mail-enabled.

After replicating creating public folder replicas on Exchange 2010 public folder databases and ensuring that the public folders were starting to replicate, my colleague received notice that specific mail-enabled public folders weren’t getting incoming mail content. Lo and behold, the HT queues were full of thousands of public folder replication messages, all queued up.

After looking at the event logs and turning up the logging levels, my colleague noticed that they were seeing a lot of the 4.3.2 STOREDRV.Deliver; recipient thread limit exceeded error message mentioned in the Microsoft Exchange team blog post Store Driver Fault Isolation Improvements in Exchange 2010 SP1. Adding the RecipientThreadLimit key and setting it to a higher level helped temporarily, but soon the queues would begin backing up again.

At that point, my colleague called me for some suggestions. We talked over a bunch of things to check and troubleshooting trees to follow depending on what he found. Earlier tonight, I got an email confirming the root cause was identified. I was not surprised to find out that the cause turned out to be something relatively basic. Instead of just telling you what it was though, I want you to tell me which of the following options YOU think it is. I’ll follow up with the answer on Monday, 10/15.

[poll id='1']

MEC Day 3

Unfortunately, this is the day that Murphy caught up with me, in the form of a migraine. When these hit, the only thing I can do is try to sleep it off.

I ended up not hitting the conference center until a bit after noon, just in time to brave lunch. What would a Microsoft conference be without the dreaded salmon meal? At that point, my stomach rebelled and my head agreed, so I wandered back to the MVP area and chatted until it was time to head upstairs to my room for my last session at 1pm.

Big thanks to everyone who showed up for the session. I took some of the feedback from Day 2, and combined with my increased mellowness from the migraine, I made some changes to the structure of the session and clarifications to the message I wanted the attendees to walk away with. We had what I thought was a brilliant session. Apparently, I do my best work while in pain.

After that, it was down to the expo floor for a quick round of good-byes, then off to catch my shuttle to Orlando International Airport. I was able to get checked in with more than enough time for a leisurely meal, then on to gate 10 where I met up with various other MEC attendees on their way back home to Seattle.

WHAT AN AMAZING CONFERENCE. I had SO much fun, even with missing essentially all of Day 3 and the wonderful sessions that I’d planned to sit into. My apologies for the missed Twitter stream that day.

We’ll have to do this again next year. I hope you’ll be there!

MEC Day 2

Today was another fun-filled and informative day at MEC:

  • The day started off with a keynote by Microsoft Distinguished Engineer Perry Clarke, head of Exchange Software Development. Perry does a blog called Ask Perry which regularly includes a video feature, Geek Out With Perry. The keynote was done in this format. The latter half was quite good, but the first half was a little slow and (I thought) lightweight for a deeply technical conference such as MEC. However, that could just have been a gradual wake-up for the people still recovering from last night’s attendee party of Universal’s Islands of Adventure theme park.
  • After a short break, we were off to the interactive sessions! I got caught up in a conversation and made it to my first session a few minutes late – and wasn’t able to enter, as the room was at capacity. So, I missed Jeff Mealiffe’s session on virtualizing Exchange 2013, much to my annoyance. Instead I headed down to the exhibit floor and hung out in the MVP area, talking with a bunch of folks (including one my homies from MCM R1).
  • At lunch I caught up with some old friends – one of the best reasons for coming.
  • After lunch, I squeezed (and by squeezed I am being literal; we were crammed into the room like sardines) into Bill Thompson’s session on the Exchange 2013 transport architecture. WOW. Some bold changes made, but I think they’re going to be good changes.
  • At 3:00, my time at the front of the room had come and I gave my first session of my Exchange 2010 virtualization lessons learned. Mostly full room and there were some good questions. I received some interesting feedback later, so will be wrapping that into tomorrow’s repeat presentation.
  • My last session of the day was Greg Taylor’s session on Exchange 2013 load balancing. Again, lots of good surprises and changes, and as always watching Greg in action was entertaining and informative. This is, after all, the man who talks about Exchange client access using elephant’s asses.
  • Afterwards, I caught up with former co-workers and enjoyed a couple of beers at MAPI Hour in the lovely central atrium of the Gaylord Palms Hotel, then went out to dinner (fantastic burger at the Wrecker’s Sports Bar). Capped the night off with a sundae.

Two down, one more to go. What a fantastic time I’ve been having!

MEC Day 1

After 10 years of absence, the Microsoft Exchange Conference is back. Yes, that’s right, the last time MEC happened was in 2002. How do I know this? I’ve seen a couple of people today who still had their MEC 2002 badges. HOLY CRAP, DUDES. I’m a serious packrat and not even *I* keep my old conference badges.

I decided to live tweet my sessions. I did a good job too – my Twitter statistics are telling me that I’ve sent 258 tweets! If any of my Facebook friends are still bothering to read my automatic Twitter-to-Facebook updates..shit, sorry. Two more days to go and you know I can’t be nearly as prolific today or Wednesday because I’m presenting a session each day:

  • E14.310 Three Years In: Looking Back At Virtualizing Exchange 2010
    Tuesday, September 25 2012 3:00 PM – 4:15 PM in Tallahassee 1
  • E14.310R1 Three Years In: Looking Back At Virtualizing Exchange 2010
    Wednesday, September 26 2012 1:00 PM – 2:15 PM in Tallahassee 1

Monday was the “all Microsoft, all Exchange 2013” day with typical keynotes and breakouts. Today, we start the “un-conference” – smaller, more interactive sessions, led by members of the community like myself. Today and tomorrow will be a lot more peer-to-peer…which will be fun.

See you out there! Drop me a note or track me down to let me know if you read my blog or have a question you’d like me to answer!

TMG? Yeah, you knew me!

Microsoft today officially announced a piece of news that came as very little surprise to anyone who has been paying attention for the last year. On May 25th of 2011, Gartner broke an unsubstantiated claim that they had been told by Microsoft that there would be no future release of Forefront Threat Management Gateway (TMG).

Microsoft finally confirmed that information. Although the TMG product will receive mainstream support until April 14, 2015 (a little bit more than 2.5 years from time of writing), it will no longer be available for sale come December 1, 2012.

Why do Exchange people care? Because TMG was the simple, no-brainer solution for environments that needed a reverse proxy in a DMZ network. Many organizations can’t allow incoming connections from the Internet to cross into an interior network. TMG provided protocol-level inspection and NAT out of the box, and could be easily configured for service-aware CAS load balancing and pre-authentication. As I said, no-brainer.

TMG had its limitations, though. No IPv6 support, poor NAT support, and an impressively stupid inability to proxy all non-HTTP protocols in a one-armed configuration. The “clustered” enterprise configuration was sometimes a pain-in-the ass to troubleshoot and work with when the central configuration database broke (and it seemed more fragile than it should be).

The big surprise for me is that TMG shares the chopping block with the on-server Forefront protection products for Exchange, SharePoint, and Lync/OCS. I personally have had more trouble than I care for with the Exchange product — it (as you might expect) eats up CPU like nobody’s business, which made care and feeding of Exchange servers harder than it needed to be. Still, to only offer online service — that’s a telling move.

My Five Favorite Features of Exchange Server 2013 Preview

Exchange Server 2013 Preview was released a few weeks ago to give us a first look at what the future holds in store for Exchange. I got a couple of weeks to dig into it in depth and so here’s my quick impression of the five changes I like the most about Exchange 2013.

  1. Client rendering is moved from the Client Access role to the Mailbox role. (TechNet) Yes, this means some interesting architectural changes to SMTP, HTTP, and RPC, but I think it will help spread load out to where it should be – the server that host active users’ mailboxes.
  2. The Client Access role is now a stateless proxy. (TechNet) This means we no longer need an expensive L7 load balancer with all sorts of fancy complicated session cookies in our HTTP/HTTPS sessions. It means a simple L4 load balancer is enough to scale the load for thousands of users based solely on source IP and port. No SSL offload required!
  3. The routing logic now recognizes DAG boundaries. (TechNet) This is pretty boss – members of a DAG that are spread across multiple sites will still act as if they were local when routing messages to each other. It’s almost like the concept of routing groups has come back in a very limited way.
  4. No more MAPI-RPC over TCP. (TechNet) Seriously. Outlook Anywhere (aka RPC over HTTPS) is where it’s at. As a result, Autodiscover for clients is mandatory, not just a really damn good idea. Firewall discussions just got MUCH easier. Believe it or not, this simplifies namespace and certificate planning…
  5. Public folders are now mailbox content. (TechNet) Instead of having a completely separate back-end mechanism for public folders, they’re now put in special mailboxes. Yes, this means they are no longer multi-master…but honestly, that causes more angst than it solves in most environments. And now SharePoint and other third-party apps can get to public folder content more easily…

There are a few things I’m not as wild about, but this is a preview and there’s no point kvetching about a moving target. We’ll see how things shake down.

I’m looking forward to getting a deeper dive at MEC in a couple of weeks, where I’ll be presenting a session on lessons learned in virtualizing Exchange 2010. Are you planning on attending?

Have you had a chance to play with Exchange 2013 yet, or at least read the preview documentation? What features are your favorite? What changes have you wondering about the implications? Send me an email or comment and I’ll see if I can’t answer you in a future blog post!

Beating Verisign certificate woes in Exchange

I’ve seen this problem in several customers over the last two years, and now I’m seeing signs of it in other places. I want to document what I found so that you can avoid the pain we had to go through.

The Problem: Verisign certificates cause Exchange publishing problems

So here’s the scenario: you’re deploying Exchange 2010 (or some other version, this is not a version-dependent issue with Exchange) and you’re using a Verisign certificate to publish your client access servers. You may be using a load balancer with SSL offload or pass-through, a reverse proxy like TMG 2010, some combination of the above, or you may even be publishing your CAS roles directly. However you publish Exchange, though, you’re running into a multitude of problems:

  • You can’t completely pass ExRCA’s validation checks. You get an error something like:  The certificate is not trusted on any version of Windows Phone device. Root = CN=VeriSign Class 3 Public Primary Certification Authority – G5, OU=”(c) 2006 VeriSign, Inc. – For authorized use only”, OU=VeriSign Trust Network, O=”VeriSign, Inc.”, C=US
  • You have random certificate validation errors across a multitude of clients, typically mobile clients such as smartphones and tablets. However, some desktop clients and browsers may show issues as well.
  • When you view the validation chain for your site certificate on multiple devices, they are not consistent.

These can be very hard problems to diagnose and fix; the first time I ran across it, I had to get additional high-level Trace3 engineers on the call along with the customer and a Microsoft support representative to help figure out what the problem was and how to fix it.

The Diagnosis: Cross-chained certificates with an invalid root

So what’s causing this difficult problem? It’s your basic case of a cross-chained certificate with an invalid root certificate. “Oh, sure,” I hear you saying now. “That clears it right up then.” The cause sounds esoteric, but it’s actually not hard to understand when you remember how certificates work: through a chain of validation. Your Exchange server certificate is just one link in an entire chain. Each link is represented by an X.509v3 digital certificate that is the footprint of the underlying server it represents.

At the base of this chain (aka the root) is the root certificate authority (CA) server. This digital certificate is unique from others because it’s self-signed – no other CA server has signed this server’s certificate. Now, you can use a root CA server to issue certificates to customers, but that’s actually a bad idea to do for a lot of reasons. So instead, you have one or more intermediate CA servers added into the chain, and if you have multiple layers, then the outermost layer are the CA servers that process customer requests. So a typical commercially generated certificate has a validation chain of 3-4 layers: the root CA, one or two intermediate CAs, and your server certificate.

Remember how I said there were reasons to not use root CAs to generate customer certificates? You can probably read up on the security rationales behind this design, but some of the practical reasons include:

  • The ability to offer different classes of service, signed by separate root servers. Instead of having to maintain separate farms of intermediate servers, you can have one pool of intermediate servers that issue certificates for different tiers of service.
  • The ability to retire root and intermediate CA servers without invalidating all of the certificates issued through that root chain, if the intermediate CA servers cross-chain from multiple roots. That is, the first layer intermediate CA servers’ certificates are signed by multiple root CA servers, and the second layer intermediate CA servers’ certificates are signed by multiple intermediate CA servers from the first layer.

So, cross-chaining is a valid practice that helps provide redundancy for certificate authorities and helps protect your investment in certificates. Imagine what a pain it would be if one of your intermediate CAs got revoked and nuked all of your certificates. I’m not terribly fond of having to redeploy certificates for my whole infrastructure without warning.

However, sometimes cross-chained certificates can cause problems, especially when they interact with another feature of the X.509v3 specification: the Authority Information Access (AIA) certificate extension. Imagine a situation where a client (such as a web browser trying to connect to OWA), presented with an X.509v3 certificate for an Exchange server, cannot validate the certificate chain because it doesn’t have the upstream intermediate CA certificate.

If the Exchange server certificate has the AIA extension, the client has the information it needs to try to retrieve the missing intermediate CA certificate – either retrieving it from the HTTPS server, or by contacting a URI from the CA to download it directly. This only works for intermediate CA certificates; you can’t retrieve the root CA certificate this way. So, if you are missing the entire certificate chain, AIA won’t allow you to validate it, but as long as you have the signing root CA certificate, you can fill in any missing intermediate CA certificates this way.

Here’s the catch: some client devices can only request missing certificates from the HTTPS server. This doesn’t sound so bad…but what happens if the server’s certificate is cross-chained, and the certificate chain on the server goes to a root certificate that the device doesn’t have…even if it does have another valid root to another possible chain? What happens is certificate validation failure, on a certificate that tested as validated when you installed it on the Exchange server.

I want to note here that I’ve only personally seen this problem with Verisign certificates, but it’s a potential problem for any certificate authority.

The Fix: Disable the invalid root

We know the problem and we know why it happens. Now it’s time to fix it by disabling the invalid root.

Step #1 is find the root. Fire up the Certificates MMC snap-in, find your Exchange server certificate, and view the certificate chain properties. This is what the incorrect chain has looked like on the servers I’ve seen it on:


The invalid root CA server circled in red

That’s a not very helpful friendly name on that certificate, so let’s take a look at the detailed properties:


Meet “VeriSign Class 3 Public Primary Certification Authority – G5”

Step #2 is also performed in the Certificates MMC snap-in. Navigate to the Third-Party Root Certification Authorities node and find your certificate. Match the attributes above to the certificate below:


Root CA certificate hide and seek

Right-click the certificate and select Properties (don’t just open the certificate) to get the following dialog, where you will want to select the option to disable the certificate for all purposes:


C’mon…you know you want to

Go back to the server certificate and view the validation chain again. This time, you should see the sweet, sweet sign of victory (if not, close down the MMC and open it up again):


Working on the chain gang

It’s a relatively easy process…so where do you need to do it? Great question!

The process I outlined obviously is for Windows servers, so you would think that you can fix this just on the the Exchange CAS roles in your Internet-facing sites. However, you may have additional work to do depending on how you’re publishing Exchange:

  • If you’re using a hardware load balancer with the SSL certificate loaded, you may not have the ability to disable the invalid root CA certificate on the load balancer. You may simply need to remove the invalid chain, re-export the correct chain from your Exchange server, and reinstall the valid root and intermediate CA certificates.
  • If you’re publishing through ISA/TMG, perform the same process on the ISA/TMG servers. You may also want to re-export the correct chain from your Exchange server onto your reverse proxy servers to ensure they have all the intermediate CA certificates loaded locally.

The general rule is that the outermost server device needs to have the valid, complete certificate chain loaded locally to ensure AIA does its job for the various client devices.

Let me know if this helps you out.

Exchange 2010 virtualization storage gotchas

There’s a lot of momentum for Exchange virtualization. At Trace3, we do a lot of work with VMware, so the majority of the customers I work with already have VMware deployed strategically into their production operation model. As a result, we see a lot of Exchange 2010 under VMware. With Exchange 2010 SP1 and lots of customer feedback, the Exchange product team has really stepped up to provide better support for virtual environments as well as more detailed guidance on planning for and deploying Exchange 2007 and 2010 in virtualization.

Last week, I was talking with a co-worker about Exchange’s design requirements in a virtual environment. I casually mentioned the “no file-level storage protocols” restriction for the underlying storage and suddenly, the conversation turned a bit more serious. Many people who deploy VMware create large data stores on their SAN and share them to the ESX cluster via the NFS protocol. There are a lot of advantages to doing it this way, and it’s a very flexible and relatively easy way to deploy VMs. However, it’s not supported for Exchange VMs.

The Heck You Say?

“But Devin,” I can hear some of you say, “what do you mean it’s not supported to run Exchange VMs on NFS-mounted data stores? I deploy all of my virtual machines using VMDKs on NFS-mounted data stores. I have my Exchange servers there. It all works.”

It probably does work. Whether or not it works, though, it’s not a supported configuration, and one thing Masters are trained to hate with a passion is letting people deploy Exchange in a way that gives them no safety net. It is an essential tool in your toolkit to have the benefit of Microsoft product support to walk you through the times when you get into a strange or deep problem.

Let’s take a look at Microsoft’s actual support statements. For Exchange 2010, Microsoft has the following to say in under virtualization (emphasis added):

The storage used by the Exchange guest machine for storage of Exchange data (for example, mailbox databases or Hub transport queues) can be virtual storage of a fixed size (for example, fixed virtual hard disks (VHDs) in a Hyper-V environment), SCSI pass-through storage, or Internet SCSI (iSCSI) storage. Pass-through storage is storage that’s configured at the host level and dedicated to one guest machine. All storage used by an Exchange guest machine for storage of Exchange data must be block-level storage because Exchange 2010 doesn’t support the use of network attached storage (NAS) volumes. Also, NAS storage that’s presented to the guest as block-level storage via the hypervisor isn’t supported.

Exchange 2007 has pretty much the same restrictions as shown in the TechNet topic. What about Exchange 2003? Well, that’s trickier; Exchange 2003 was never officially supported under any virtualization environment other than Microsoft Virtual Server 2005 R2.

The gist of the message is this: it is not supported by Microsoft for Exchange virtual machines to use disk volumes that are on file-level storage such as NFS or CIFS/SMB, if those disk volumes hold Exchange data. I realize this is a huge statement, so let me unpack this a bit. I’m going to assume a VMware environment here, but these statements are equally true for Hyper-V or any other hypervisor supported under the Microsoft SVVP.

While the rest of the discussion will focus on VMware and NFS, all of the points made are equally valid for SMB/CIFS and other virtualization system. (From a performance standpoint, I would not personally want to use SMB for backing virtual data stores; NFS, in my experience, is much better optimized for the kind of large-scale operations that virtualization clusters require. I know Microsoft is making great strides in improving the performance of SMB, but I don’t know if it’s there yet.

It’s Just Microsoft, Right?

So is there any way to design around this? Could I, in theory, deploy Exchange this way and still get support from my virtualization vendor? A lot of people I talk to point to a whitepaper that VMware published in 2009 that showed the relative performance of Exchange 2007 over iSCSI, FC, and NFS. They use this paper as “proof” that Exchange over NFS is supported.

Not so much, at least not with VMware. The original restriction may come from the Exchange product group (other Microsoft workloads are supported in this configuration), but the other vendors certainly know the limitation and honor it in their guidance. Look at VMware’s Exchange 2010 best practices at on page 13:

It is important to note that there are several different shared-storage options available to ESX (iSCSI, Fibre Channel, NAS, etc.); however, Microsoft does not currently support NFS for the Mailbox Server role (clustered or standalone). For Mailbox servers that belong to a Database Availability Group, only Fibre Channel is currently supported; iSCSI can be used for standalone mailbox servers. To see the most recent list of compatibilities please consult the latest VMware Compatibility Guides.

According to this document, VMware is even slightly more restrictive! If you’re going to use RDMs (this section is talking about RDMs, so don’t take the iSCSI/FC statement as a limit on guest-level volume mounts), VMware is saying that you can’t use iSCSI RDMs, only FC RDMs.

Now, I believe – and there is good evidence to support me – that this guidance as written is actually slightly wrong:

  • The HT queue database is also an ESE database and is subject to the same limitations; this is pretty clear on a thorough read-through of the Exchange 2010 requirements in TechNet. Many people leave the HT queue database on the same volume they install Exchange to, which means that volume also cannot be presented via NFS. If you follow best practices, you move this queue database to a separate volume (which should be an RDM or guest-mounted iSCSI/FC LUN).
  • NetApp, one of the big storage vendors that supports the NFS-mounted VMware data store configuration, only supports Exchange databases mounted via FC/iSCSI LUNs using SnapManager for Exchange (SME) as shown in NetApp TR-3845. Additionally, in the join NetApp-VMware-Cisco performance whitepaper on virtualizing Microsoft workloads, the only configuration tested for Exchange 2010 is FC LUNs (TR-3785).
  • It is my understanding that the product group’s definition of Exchange files doesn’t just extend to ESE files and transaction logs, but to all of the Exchange binaries and associated files. I have not yet been able to find a published source to document this interpretation, but I am working on it.
  • I am not aware of any Microsoft-related restriction about iSCSI + DAG. This VMware Exchange 2010 best practices document (published in 2010) is the only source I’ve seen mention this restriction, and in fact, the latest VMware Microsoft clustering support matrix (published in June 2011) lists no such restriction. Microsoft’s guidelines seem to imply that block storage is block storage is block storage when it comes to “SCSI pass-through storage”). I have queries in to nail this one down because I’ve been asking in various communities for well over a year with no clear resolution other than, “That’s the way VMware is doing it.”

Okay, So Now What?

When I’m designing layouts for customers who are used to deploying Windows VMs via NFS-mounted VMDKs, I have a couple of options. My preferred option, if they’re also using RDMs, is to just have them provision one more RDM for the system drive and avoid NFS entirely for Exchange servers. That way, if my customer does have to call Microsoft support, we don’t have to worry about the issue at all.

However, that’s not always possible. My customer may have strict VM provisioning processes in place, have limited non-NFS storage to provision, or have some other reason why they need to use NFS-based VMDKs. In this case, I have found the following base layout to work well:

Volume Type Notes
C: VMDK or RDM Can be on any type of supported data store. Should be sized to include static page file of size PhysicalRAM + 10 MB.
E: RDM or guest iSCSI/FC iSCSI/FC    All Exchange binaries installed here. Move IIS files here (scripts out on Internet to do this for you). Create an E:\Exchdata directory and use NTFS mount points to mount each of the data volumes the guest will mount.
Data volumes RDM or guest iSCSI/FC Any volume holding mailbox/PF database EDB or logs, or HT queue EDB or logs. Should mount these separately, NTFS mount points recommended. Format these NTFS volumes with 64K block size, not default.

Note that we have several implicit best practices in use here:

  • Static page file, properly sized for a 64-bit operating system with a large amount of physical RAM. Doing this ensures that you have enough virtual memory for the Exchange memory profile AND that you can write a kernel memory crash dump to disk in the event of a blue screen. (If the page file is not sized properly, or is not on C:, the full dump cannot be written to disk.)
  • Exchange binaries not installed on the system drive. This makes restores much easier. Since Exchange uses IIS heavily, I recommend moving the IIS data files (the inetpub and children folders) off of the system drive and onto the Exchange volume. This helps reduce the rate of change on the system drive and offers other benefits such as making it easier to properly configure anti-virus exclusions.
  • The use of NTFS mount points (which mount the volume to a directory) instead of separate drive letters. For large DAGs, you can easily have a large number of volumes per MB role, making the use of drive letters a limitation on scalability. NTFS mount points work just like Unix mount points and work terribly well – they’ve been supported since Exchange 2003 and recommended since the late Exchange 2003 era for larger clusters. In Exchange 2007 and 2010 continuous replication environments (CCR, SCR, DAG), all copies must have the same pathnames.
  • Using NTFS 64K block allocations for any volumes that hold ESE databases. While not technically necessary for log partitions, doing so does not hurt performance.

So Why Is This Even A Problem?

This is the money question, isn’t it? Windows itself is supported under this configuration. Even SQL Server is. Why not Exchange?

At heart, it comes down to this: the Exchange ESE database engine is a very finely-tuned piece of software, honed for over 15 years. During that time, with only one exception (the Windows Storage Server 2003 Feature Pack 1, which allowed storage solutions running WSS 2003 + FP1 to host Exchange database files over NAS protocols), Exchange has never supported putting Exchange database files over file-level storage. I’m not enough of an expert on ESE to whip up a true detailed answer, but here is what I understand about it.

Unlike SQL Server, ESE is not a general purpose database engine. SQL is optimized to run relational databases of all types. The Exchange flavor of ESE is optimized for just one type of data: Exchange. As a result, ESE has far more intimate knowledge about the data than any SQL Server instance can. ESE provides a lot of performance boosts for I/O hungry Exchange databases and it can do so precisely because it can make certain assumptions. One of those assumptions is that it’s talking to block-level storage.

When a host process commits writes to storage, there’s a very real difference in the semantics of the write operation between block-level protocols and file-level protocols. Exchange, in particular, depends dramatically on precise control over block-level writes – which file protocols like NFS and SMB can mask. The cases under which this can cause data corruption for Exchange are admittedly corner cases, but they do exist and they can cause impressive damage.

Cleaning Up

What should we do about it if we have an Exchange deployment that is in violation of these support guidelines?

Ideally, we fix it. Microsoft’s support stance is very clear on this point, and in the unlikely event that data loss occurs in this configuration, Microsoft support is going to point at the virtualization/storage vendors and say, “Get them to fix it.” I am not personally aware of any cases of a configuration like this causing data loss or corruption, but I am not the Exchange Product Group – they get access to an amazing amount of data.

At the very least, you need to understand and document that you are in an unsupported configuration so that you can make appropriate plans to get into support as you roll out new servers or upgrade to future versions of Exchange. This is where getting a good Exchange consultant to do an Exchange health check can help you get what you need and provide the support you need with your management – we will document this in black and white and help provide the outside validation you might need to get things put right.

One request for the commenters: if all you’re going to do is say, “Well we run this way and have no problems,” don’t bother. I know and stipulate that there are many environments out there running in violating of this support boundary that have not (yet) run into issues. I’ve never said it won’t work. There are a lot of things we can do, but that doesn’t mean we should do them. At the same time, at the end of the day – if you know the issues and potential risks, you have to make the design decision that’s right for your organization. Just make sure it’s an informed (and documented, and signed-off!) decision.

Devin’s Load Balancer for Exchange 2010


One of the biggest differences I’m seeing when deploying Exchange 2010 compared to previous versions is that for just about all of my customers, load balancing is becoming a critical part of the process. In Exchange 2003 FE/BE, load balancing was a luxury unheard of for all but the largest organizations with the deepest pockets. Only a handful of outfits offered load balancing products, and they were expensive. For Exchange 2007 and the dedicated CAS role, it started becoming more common.

For Exchange 2003 and 2007, you could get all the same benefits of load balancing (as far as Exchange was concerned) by deploying an ISA server or ISA server cluster using Windows Network Load Balancing (WNLB). ISA included the concept of a “web farm” so it would round-robin incoming HTTP connections to your available FE servers (and Exchange 2007 CAS servers). Generally, your internal clients would directly talk to their mailbox servers, so this worked well. Hardware load balancers were typically used as a replacement for publishing with an ISA reverse proxy (and more rarely to load balance the ISA array instead of WNLB). Load balancers could perform SSL offloading, pre-authentication, and many of the same tasks people were formerly using ISA for. Some small shops deployed WNLB for Exchange 2003 FEs and Exchange 2007 CAS roles.

In Exchange 2010, everything changes. Outlook RPC connections now go to the CAS servers in the site, not the MB server that hosts the active copy of the database. Mailbox databases now have an affiliation with either a specific CAS server or a site-specific RPC client access array, which you can see using the –RpcClientAccessServer parameter of the Get-MailboxDatabase cmdlet. If you have two or more servers, I recommend you set up the RPC client access array as part of the initial deployment and get some sort of load balancer in place.

Load Balancing Options

At Trace3, we’re an F5 reseller, and F5 is one of the few load balancer companies out there that has really made an effort to understand and optimize Exchange 2010 deployments. However, I’m not on the sales side; I have customers using a variety of load balancing solutions for their Exchange deployments. At the end of the day, we want the customer to do what’s right for them. For some customers, that’s an F5. Others require a different solution. In those cases, we have to get creative – sometimes they don’t have budget, sometimes the networking team has their own plans, and on some rare occasions, the plans we made going in turned out not to be a good fit after all and now we have to come up with something on the fly.

If you’re not in a position to use a high-end hardware load balancer like an F5 BIG-IP or a Cisco ACE solution, and can’t look at some of the lower-cost (and correspondingly lower-feature) solutions that are now on the market, there are few alternatives:

  • WNLB. To be honest, I have attempted to use this in several environments now and even when I spent time going over the pros and cons, it failed to meet expectations. If you’re virtualizing Exchange (like many of my customers) and are trying to avoid single points of failure, WNLB is so clearly not the way to go. I no longer recommend this to my customers.
  • DNS round robin. This method at least has the advantage of in theory driving traffic to all of the CAS instances. However, in practice it gets in the way of quickly resolving problems when they come up. It’s better than nothing, but not by much.
  • DAG cluster IP. Some clever people came up with this option for instances where you are deploying multi-role servers with MB+HT+CAS on all servers and configuring them in a DAG. DAG = cluster, these smart people think, and clusters have a cluster IP address. Why can’t we just use that as the IP address of the RPC client access array? Sure enough, this works, but it’s not tested or supported by Microsoft and it isn’t a perfect solution. It’s not load balancing at all; the server holding the cluster IP address gets all the CAS traffic. Server sizing is important!

The fact of the matter is, there are no great alternatives if you’re not going to use hardware load balancing. You’re going to have to compromise something.

Introducing Devin’s Load Balancer

For many of my customers, we end up looking something like this:

  • The CAS/HT roles are co-located on one set of servers, while MB (and the DAG) is on another. This rules out the DAG cluster IP option.
  • They don’t want users to complain excessively when something goes wrong with one of the CAS/HT servers. This rules out DNS round robin.
  • They don’t have the budget for a hardware solution yet, or one is already in the works but not ready because of schedule. They need a temporary, low-impact solution. This effectively rules out WNLB.

I’ve come up with a quick and dirty fix I call Devin’s Load Balancer or, as I commonly call it, the DLB. It looks like this:

  1. Pick one CAS server that can handle all the traffic for the site. This is our target server.
  2. Pick an IP address for the RPC client access array for the site. Create the DNS A record for the RPC client access array FQDN, pointing to the IP address.
  3. Create the RPC client access array in EMS, setting the name, FQDN, and site.
  4. On the main network interface of the target server, add the IP address. If this IP address is on the same subnet as the main IP address, there is no need to create a secondary interface! Just add it as a secondary IP address/subnet mask.
  5. Make sure the appropriate mailbox databases are associated with the RPC client access array.
  6. Optionally, point the internal HTTP load balance array DNS A record to this IP address as well (or publish this IP address using ISA).

You may have noticed that this sends all traffic to the target server; it doesn’t really load balance. DLB also stands for Doesn’t Load Balance!

This configuration, despite its flaws, gives me what I believe are several important benefits:

  • It’s extremely easy to switchover/failover. If something happens to my target server, I simply add the RPC client access array IP address as a secondary IP address to my next CAS instance. There are no DNS cache entries to wait to expire. There are are no switch configurations to modify. There are no DNS records I have to update. If this is a planned switchover, client get disrupted but can immediately connect. I can make the update as soon as I get warning that something happened and my clients can reconnect without any further action on their part.
  • It isolates what I do with the other CAS instances. Windows and Exchange no longer have any clue they’re in a load balanced pseudo-configuration. With WNLB, if I make any changes to the LB cluster (like add or remove a member), all connections to the cluster IP addresses are dropped!
  • It makes it very easy to upgrade to a true load balancing solution. I set the true solution up in parallel with an alternate, temporary IP address. I use local HOSTS file entries on my test machines while I’m getting everything tested and validated. And then I simply take the RPC client access array IP address off the target server and put it on the load balancer. Existing connections are dropped, but new ones immediately connect with no timeouts – and now we’re really load balancing.

Note that you do not need the CAS SSL certificate to contain the FQDN of the RPC client access array as a SAN entry. RPC doesn’t use SSL for encryption (it’s not based on HTTP).

Even in a deployment where the customer is putting all roles into single-server configuration, if there’s any thought at all that they might want to expand to an HA configuration in the future, I now am in the habit of configuring this. The RPC client access array is now configured and somewhat isolated from the CAS configuration, so now my future upgrades are easier and less disruptive.

Moving to Exchange Server 2010 Service Pack 1

Microsoft recently announced that Service Pack 1 (SP1) for Exchange Server 2010 had been released to web, prompting an immediate upgrade rush for all of us Exchange professionals. Most of us maintain at least one home/personal lab environment, the better to pre-break things before setting foot on a customer site. Before you go charging out to do this for production (especially if you’re one of my customers, or don’t want to run the risk of suddenly becoming one of my customers), take a few minutes to learn about some of the current issues with SP1.

Easy Installation and Upgrade Slipstreaming

One thing that I love about Exchange service packs is that from Exchange 2007 on, they’re full installations in their own right. Ready to deploy a brand new Exchange 2010 SP1 server? Just run setup from the SP1 binaries – no more fiddling around with the original binaries, then applying your service packs. Of course, the Update Rollups now take the place of that, but there’s a mechanism to slipstream them into the installer (and here is the Exchange 2007 version of this article).

Note: If you do make use of the slipstream capabilities, remember that Update Rollups are both version-dependent (tied to the particular RTM/SP release level) and are cumulative. SP1 UR4 is not the same thing as RTM UR4! However, RTM UR4 will include RTM UR3, RTM UR2, and RTM UR1…just as SP1 UR4 will contain SP1 UR3, SP1 UR2, and SP1 UR1.

The articles I linked to say not to slipstream the Update Rollups with a service pack, and I’ve heard some confusion about what this means. It’s simple: you can use the Updates folder mechanism to slipstream the Update Rollups when you are performing a clean install. You cannot use the slipstream mechanism when you are applying a service pack to an existing Exchange installation. In the latter situation, apply the service pack, then the latest Update Rollup.

It’s too early for any Update Rollups for Exchange 2010 SP1 to exist at the time of writing, but if there were (for the sake of illustration, let’s say that SP1 UR X just came out), consider these two scenarios:

  • You have an existing Exchange 2010 RTM UR4 environment and want to upgrade directly to SP1 UR1. You would do this in two steps on each machine: run the SP1 installer, then run the latest SP1 UR X installer.
  • You now want to add a new Exchange 2010 server into your environment and want it to be at the same patch level. You could perform the installation in a single step from the SP1 binaries by making sure the latest SP1 UR X installer was in the Updates folder.

If these scenarios seem overly complicated, just remember back to the Exchange 2003 days…and before.

Third Party Applications

This might surprise you, but in all of the current Exchange 2010 projects I’m working on, I’ve not even raised the question of upgrading to SP1 yet. Why would I not do that? Simple – all of these environments have dependencies on third-party software that is not yet certified for Exchange 2010 SP1. In some cases, the software has barely just been certified for Exchange 2010 RTM! If the customer brings it up, I always encourage them to start examining SP1 in the lab, but for most production environments, supportability is a key requirement.

Make sure you’re not going to break any applications you care about before you go applying service packs! Exchange service packs always make changes – some easy to see, some harder to spot. You may need to upgrade your third-party applications, or you may simply need to make configuration changes ahead of time – but if you blindly apply service packs, you’ll find these things out the hard way. If you have a critical issue or lack of functionality that the Exchange 2010 SP1 will address, get it tested in your lab and make sure things will work.

Key applications I encourage my customers to test include:

Applications that use SMTP submission are typically pretty safe, and there are other applications that you might be okay living without if something does break. Figure out what you can live with, test them (or wait for certifications), and go from there.

Complications and Gotchas

Unfortunately, not every service pack goes smoothly. For Exchange 2010 SP1, one of the big gotchas that early adopters are giving strong feedback about is the number of hotfixes you must download and apply to Windows and the .NET Framework before applying SP1 (a variable number, depending on which base OS your Exchange 2010 server is running).

Having to install hotfixes wouldn’t be that bad if the installer told you, “Hey, click here and here and here to download and install the missing hotfixes.” Exchange has historically not done that (citing boundaries between Microsoft product groups) even though other Microsoft applications don’t seem to be quite as hobbled. However, this instance of (lack of) integration is particularly egregious because of two factors.

Factor #1: hotfix naming conventions. Back in the days of Windows 2000, you knew whether a hotfix was meant for your system, because whether you were running Workstation or Server, it was Windows 2000. Windows XP and Windows 2003 broke that naming link between desktop and server operating systems, often confusingly so once 64-bit versions of each were introduced (32-bit XP and 32-bit 2003 had their own patch versions, but 64-bit XP applied 64-bit 2003 hotfixes).

Then we got a few more twists to deal with. For example, did you know that Windows Vista and Windows Server 2008 are the same codebase under the hood? Or that Windows 7 and Windows Server 2008 R2, likewise, are BFFs? It’s true. Likewise, the logic behind the naming of Windows Server 2003 R2 and Windows Server 2008 R2 were very different; Windows Server 2003 R2 was basically Windows Server 2003 with a SP and few additional components, while Windows Server 2008 R2 has some substantially different code under the hood than Windows Server 2008 with SP. (I would guess that Windows Server 2008 R2 got the R2 moniker to capitalize on Windows 2008’s success, while Windows 7 got a new name to differentiate itself from the perceived train wreck that Vista had become, but that’s speculation on my part.)

At any rate, figuring out which hotfixes you need – and which versions of those hotfixes – is less than easy. Just remember that you’re always downloading the 64-bit patch, and that Windows 2008=Vista while Windows 2008 R2=Windows 7 and you should be fine.

Factor #2: hotfix release channels. None of these hotfixes show up under Windows Update. There’s no easy installer or tool to run that gets them for you. In fact, at least two of the hotfixes must be obtained directly from Microsoft Customer Support Services. All of these hotfixes include scary legal boilerplate about not being fully regression tested and thereby not supported unless you were directly told to install them by CSS. This has caused quite a bit of angst out in the Exchange community, enough so that various people are collecting the various hotfixes and making them available off their own websites in one easy package to download[1].

I know that these people mean well and are trying to save others from a frustrating experience, but in this case, the help offered is a bad idea. That same hotfix boilerplate means that everyone who downloads those hotfixes agree not to redistribute those hotfixes. There’s no exception for good intentions. If you think this is bogus, let me give you two things to think about:

  • You need to be able to verify that your hotfixes are legitimate and haven’t been tampered with. Do you really want to trust production mission-critical systems to hotfixes you scrounged from some random Exchange pro you only know through blog postings? Even if the pro is trustworthy, is their web site? Quite frankly, I trust Microsoft’s web security team to prevent, detect, and mitigate hotfix-affecting intrusions far more quickly and efficiently than some random Exchange professional’s web host. I’m not disparaging any of my colleagues out there, but let’s face it – we have a lot more things to stay focused on. Few of us (if any) have the time and resources the Microsoft security guys do.
  • Hotfixes in bundles grow stale. When you link to a KB article or Microsoft Download offering to get a hotfix, you’re getting the most recent version of that hotfix. Yes, hotfixes may be updated behind the scenes as issues are uncovered and testing results come in. In the case of the direct-from-CSS hotfixes, you can get them for free through a relatively simple process. As part of that process, Microsoft collects your contact info so they can alert you if any issues later come up with the hotfix that may affect you. Downloading a stale hotfix from a random bundle increases the chances of getting an old hotfix version that may cause issues in your environment, costing you a support incident. How many of these people are going to update their bundles as new hotfix versions become available? How quickly will they do it – and how will you know?

The Exchange product team has gotten an overwhelming amount of feedback on this issue, and they’ve responded on their blog. Not only do they give you a handy table rounding up links to get the hotfixes, they also collect a number of other potential gotchas and advice to learn from from before beginning your SP1 deployment. Go check it out, then start deploying SP1 in your lab.

Good luck, and have fun! SP1 includes some killer new functionality, so take a look and enjoy!

[1] If you’re about to deploy a number of servers in a short period of time, of course you should cache these downloaded hotfixes for your team’s own use. Just make sure that that you check back occasionally for updated versions of the hotfixes. The rule of thumb I’d use is about a week – if I’m hitting my own hotfix cache and it’s older than a week, it’s worth a couple of minutes to make sure it’s still current.

Manually creating a DAG FSW for Exchange 2010

I just had a comment from Chris on my Busting the Exchange Trusted Subsystem Myth post that boiled down to asking what you do when you have to create the FSW for an Exchange 2010 DAG manually?

In order for this to be true, you have to have the following conditions:

  1. You have no other Exchange 2010 servers in the AD site. This implies that at least one of your DAG nodes is multi-role — remember that you need to have a CAS role and an HT role in the same site as your MB roles, preferably two or more of each for redundancy and load. If you have another Exchange 2010 server, then it’s already got the correct permissions — let Exchange manage the FSW automatically.
  2. If the site in question is part of a DAG that stretches sites, there are more DAG nodes in this site than in the second site. If you’re trying to place the FSW in the site with fewer members, you’re asking for trouble[1].
  3. You have no other Windows 2003 or 2008 servers in the site that you consider suitable for Exchange’s automatic FSW provisioning[2]. By this, I mean you’re not willing to the the Exchange Trusted Subsystem security group to the server’s local Administrators group so that Exchange can create, manage, and repair the FSW on its own. If your only other server in the site is a DC, I can understand not wanting to add the group to the Domain Admins group.

If that’s the case, and you’re dead set on doing it this way, you will have to manually create the FSW yourself. A FSW consists of two pieces: the directory, and the file share. The process for doing this is not documented anywhere on TechNet that I could find with a quick search, but happily, one Rune Bakkens blogs the following process:

To pre-create the FSW share you need the following:
– Create a folder etc. D:\FilesWitness\DAGNAME
– Give the owner permission to Exchange Trusted Subsystem
– Give the Exchange Trusted Subsystem Full Control (NTFS)
– Share the folder with the following DAGNAME.FQDN (If you try a different share name,
it won’t work. This is somehow required)
– Give the DAGNAME$ computeraccount Full Control (Share)

When you’ve done this, you can run the set-databaseavailabilitygroup -witnessserver CLUSTERSERVER – witnessdirectory D:\Filewitness\DAGNAME

You’ll get the following warning message:

WARNING: Specified witness server Cluster.fqdn is not an Exchange server, or part of the Exchange Servers security group.
WARNING: Insufficient permission to access file shares on witness server Cluster.fqdn. Until this problem is corrected, the database availability group may be more vulnerable to failures. You can use the set-databaseavailabilitygroup cmdlet to try the operation again. Error: Access is denied

This is expected, since the cmdlet tries to create the folder and share, but don’t have the permissions to do this.

When this is done, the FSW should be configured correct. To verify this, the following files should be created:

– VerifyShareWriteAccess
– Witness

Just for the record, I have not tested this process yet. However, I’ve had to do some recent FSW troubleshooting lately and this matches with what I’ve seen for naming conventions and permissions, so I’m fairly confident this should get you most of the way there. Thank you, Rune!

Don’t worry, I haven’t forgotten the next installment of my Exchange 2010 storage series. It’s coming, honest!

[1] Consider the following two-site DAG scenarios:

  • If there’s an odd number of MB nodes, Exchange won’t use the FSW.
  • An even number (n) of nodes in each site. The FSW is necessary for there to even be a quorum (you have 2n+1 nodes so a simple majority is n+1). If you lose the FSW and one other node — no matter where that node is — you’ll lose quorum. If you lose the link between sites, you lose quorum no matter where the FSW is.
  • A number (n) nodes in site A, with at least one fewer nodes (m) in site B. If n+m is odd, you have an odd number of nodes — our first case. Even if m is only 1 fewer than n, putting the FSW in site B is meaningless — if you lose site A, B will never have quorum (in this case, m+1 = n, and n is only half — one less than quorum).

I am confident in this case that if I’ve stuffed up the math here, someone will come along to correct me. I’m pretty sure I’m right, though, and now I’ll have to write up another post to show why. Yay for you!

[2] You do have at least one other Windows server in that site, though, right — like your DC? Exchange doesn’t like not having a DC in the local site — and that DC should also be a GC.

The Disk’s The Thing! Exchange 2010 Storage Essays, part 2

Greetings, readers! When I first posted From Whence Redundancy? (part 1 of this series of essays on Exchange 2010 storage) I’d intended to follow up with other posts a bit faster than I have been. So much for intentions; let us carry on.

In part 1, I began the process of talking about how I think the new Exchange 2010 storage options will play out in live Exchange deployments over the next several years. The first essay in this series discussed what is I believe the fundamental question at the heart an Exchange 2010 storage design: at what level will you ensure the redundancy of your Exchange mailbox databases? The traditional approach has used RAID at the disk level, but Exchange 2010 DAGs allow you to deploy mailbox databases in JBOD configurations. While I firmly believe that’s the central question, answering it requires us to dig under the hood of storage.

With Exchange 2010, Microsoft specifically designed Exchange mailbox servers to be capable of using the lowest common denominator of server storage: a directly attached storage (DAS) array of 7200 RPM SATA disks in a Just a Box of Disks (JBOD) configuration (what I call DJS). Understanding why they’ve made this shift requires us to understand more about the disk drive technology. In this essay, part 2 of this series, let’s talk about disk technology and find out how Fibre Channel (FC), Serially Attached SCSI (SAS), and Serial Advanced Technology Attachment (SATA) disk drives are the same – and more importantly, what slight differences they have and what that means for your Exchange systems.

Exchange Storage SATA vs SAS

So here’s the first dirty little secret: for the most part, all disks are the same. Regardless of what type of bus they use, what form factor they are, what capacity they are, and what speed they rotate at, all modern disks use the same construction and principles:

  • They all have one or more thin rotating platters coated with magnetic media; the exact number varies by form factor and capacity. Platters look like mini CD-ROM disks, but unlike CDs, platters are typically double-sided. Platters have a rotational speed measured in revolutions per minute (RPMs).
  • Each side of a platter has an associated read-write head. These heads are on a single-track arm that moves in toward the hub of the platter or out towards the rim. The heads do not touch the platter, but float very close to the surface. It takes a measurable fraction of a second for the head to relocate from one position to another; this is called its seek time.
  • The circle described by the head’s position on the platter is called a track. In a multi-platter disk, the heads move in synchronization (there’s no independent tracking per platter or side). As a result, each head is on the same track at the same time, describing a cylinder.
  • Each drive unit has embedded electronics that implement the bus protocol, control the rotational speed of the platters, and translate I/O requests into the appropriate commands to the heads. Even though there are different flavors, they all perform the same basic functions.

If you would like a more in-depth primer on how disks work, I recommend starting with this article. I’ll wait for you.

Good? Great! So that’s how all drives are the same. It’s time to dig into the differences. They’re relatively small, but small differences have a way of piling up. Take a look at Table 1 which summarizes the differences between various FC, SATA, and SAS disks, compared with legacy PATA 133 (commonly but mistakenly referred to as IDE) and SCSI Ultra 320 disks:

Table 1: Disk parameter differences by disk bus type

Type Max wire bandwidth(Mbit/s) Max data transfer(MB/s)
PATA 133 1,064 133.5
SCSI Ultra 320 2,560 320
SATA-I 1,500 150
SATA-II 3,000 300
SATA 6 Gb/s 6,000 600
SAS 150 1,500 150
SAS 300 3,000 300
FC (copper) 4,000 400
FC (optic) 10,520 2,000


As of this writing, the most common drive types you’ll see for servers are SATA-II, SAS 300, and FC over copper. Note that while SCSI Ultra 320 drives in theory have a maximum data transfer higher than either SATA-II or SAS 300, in reality that bandwidth is shared among all the devices connected to the SCSI bus; both SATA and SAS have a one-to-one connection between disk and controller, removing contention. Also remember that SATA is only a half-duplex protocol, while SAS is a full-duplex protocol. SAS and FC disks use the full SCSI command set to allow better performance when multiple I/O requests are queued for the drive, whereas SATA uses the ATA command set. Both SAS and SATA implement tagged queuing, although they use two different standards (each of which has its pros and cons).

The second big difference is the average access time of the drive, which is the sum of multiple factors:

  • The average seek time of the heads. The actuator motors that move the heads from track to track are largely the same from drive to drive and thus the time contributed to the drive’s average seek time by just the head movements is roughly the same from drive to drive. What varies is the length of the head move; is it moving to a neighboring track, or is it moving across the entire surface? We can average out small track changes with large track changes to come up with idealized numbers.
  • The average latency of the platter. How fast the platters are spinning determines how quickly a given sector containing the data to be read (or where new data will be written) will move into position under the head once it’s in the proper track. This is a simple calculation based on the RPM of the platter and the observed average drive latency. We can assume that a given sector will move into position, on average, in no more than half a rotation. This gives us 30 seconds out of each minute of rotation, or 30,000 ms, into which we can divide the drive’s actual rotation.
  • The overhead caused by the various electronics and queuing mechanisms of the drive electronics, including any power saving measures such as reducing the spin rate of the drive platters. Although electricity is pretty fast and on-board electronics are relatively small circuits, there may be other factors (depending on the drive type) that may introduce delays into the process of fulfilling the I/O request received from the host server.

What has the biggest impact is how fast the platter is spinning, as shown in Table 2:

Table 2: Average latency caused by rotation speed

Platter RPM Average latency in ms
7,200 4.17
10,000 3
12,000 2.5
15,000 2


(As an exercise, do the same math on the disk speeds for the average laptop drives. This helps explain why laptop drives are so much slower than even low-end 7,200 RPM SATA desktop drives.)

Rather than painfully take you through the result of all of these tables and calculations step by step, I’m simply going to refer you to work that’s already been done. Once we know the various averages and performance metrics, we can figure out how many I/O operations per second (IOPS) a given drive can sustain on average, according to the type, RPMs, and nature of the I/O (sequential or random). Since Microsoft has already done that work for us as part of the Exchange 2010 Mailbox Role Calculator (version 6.3 as of this writing, I’m going to simply use the values there. Let’s take a look at how all this plays out in Table 3 by selecting some representative values.

Table 3: Drive IOPS by type and RPM

Size Type RPM Average Random IOPS
3.5” SATA 5,400 50
2.5” SATA 5,400 55
3.5” SAS 5,400 52.5
3.5” SAS 5,900 52.5
3.5” SATA 7,200 55
2.5” SATA 7,200 60
3.5” SAS 7,200 57.5
2.5” SAS 7,200 62.5
3.5” FC/SCSI/SAS 10,000 130
2.5” SAS 10,000 165
3.5” FC/SCSI/SAS 15,000 180
2.5” SAS 15,000 230


There are three things to note about Table 3.

  1. These numbers come from Microsoft’s Exchange 2010 Mailbox Sizing Calculator and are validated across vendors through extensive testing in an Exchange environment. While there may be minor variances between drive model and manufacturers and these number may seem pessimistic according to calculated IOPS number published for individual drives, these are good figures to use in the real world. Using calculated IOPS numbers can lead both to a range of figures, depending on the specific drive model and manufacturer, as well as to overestimating the amount of IOPS the drive will actually provide to Exchange.
  2. For the most part, SAS and FC are indistinguishable from the IOPs point of view. Regardless of the difference between the electrical interfaces, the drive mechanisms and I/O behaviors are comparable.
  3. Sequential IOPS are not listed; they will be quite a bit higher than the random IOPS (that same 7,200RPM SATA drive can provide 300+ IOPS for sequential operations). The reason is simple; although a lot of Exchange 2010 I/O has been converted from random to sequential, there’s still some random I/O going on. That’s going to be the limiting factor.

The IOPS listed are per-drive IOPS. When you’re measuring your drive system, remember that the various RAID configurations have their own IOPS overhead factor that will consume a certain number

There are of course some other factors that we need to consider, such as form factor and storage capacity. We can address these according to some generalizations:

  • Since SAS and FC tend to have the same performance characteristics, the storage enclosure tends to differentiate between which technology is used. SAS enclosures can often be used for SATA drives as well, giving more flexibility to the operator. SAN vendors are increasingly offering SAS/SATA disk shelves for their systems because paying the FC toll can be a deal-breaker for new storage systems.
  • SATA disks tend to have a larger storage capacity than SAS or FC disks. There are reasons for this, but the easiest one to understand is that SAS, being traditionally a consumer technology, has a lower duty cycle and therefore lower quality control specifications that must be met.
  • SATA disks tend to be offered with lower RPMs than SAS and FC disks. Again, we can acknowledge that quality control plays a part here – the faster a platter spins, the more stringently the drive components need to meet their specifications for a longer period of time.
  • 2.5” drives tend to have lower capacity than their 3.5” counterparts. This makes sense – they have smaller platters (and may have fewer platters in the drive).
  • 2.5” drives tend to use less power and generate less heat than equivalent 3.5” drives. This too makes sense – the smaller platters have less mass, requiring less energy to sustain rotation.
  • 2.5” drives tend to permit a higher drive density in a given storage chassis while using only fractionally more power. Again, this makes sense based on the previous two points; I can physically fit more drives into a given space, sometimes dramatically so.

Let’s look at an example. A Supermicro SC826 chassis holds 12 3.5” drives with a minimum of 800W power while the equivalent Supermicro SC216 chassis holds 24 2.5” drives with a minimum of 900W of power in the same 2Us of rack space. Doubling the number of drives makes up for the capacity difference between the 2.5” and 3.5” drives, provides twice as many spindles and allows a greater aggregate IOPS for the array, and only requires 12.5% more power.

The careful reader has noted that I’ve had very little to say about capacity in this essay, other than the observation above that SATA drives tend to have larger capacities, and that 3.5” drives tend to be larger than 2.5” drives. From what I’ve seen in the field, the majority of shops are just now looking at 2.5” drive shelves, so it’s safe to assume 3.5” is the norm. As a result, the 3.5” 7,200 RPM SATA drive represents the lowest common denominator for server storage, and that’s why the Exchange product team chose that drive as the performance bar for DJS configurations.

Exchange has been limited by performance (IOPS) requirements for most of its lifetime; by going after DJS, the product team has been able to take advantage of the fact that the capacity of these drives is the first to grow. This is why I think that Microsoft is betting that you’re going to want to simplify your deployment, aim for big, cheap, slow disks, and let Exchange DAGs do the work of replicating your data.

Now that we’ve talked about RAID vs. JBOD and SATA vs. SAS/FC, we’ll need to examine the final topic: SAN vs. DAS. Look for that discussion in Part 3, which will be forthcoming.

More Exchange blogging with Trace3!

I just wanted to drop a quick note to let you all know that I’ll be cross-posting all of my Exchange related material both here and at the Trace3 blog. The Trace3 blog is a multi-author blog, so you’ll get not only all my Exchange-related content, but you’ll get a variety of other interesting discussions from a number of my co-workers.

To kick it off, I’ve updated my From Whence Redundancy? Exchange 2010 Storage Essays, Part 1 post with some new material on database reseed times and reposted it there in its entirety. Don’t worry, I’ve also updated it here.

What Exchange 2010 on Windows Datacenter Means

Exchange Server has historically come in two flavors for many versions – Standard Edition and Enterprise Edition. The main difference this license change made for you was the maximum number of supported mailbox databases as shown in Table 1:

Version Standard Edition Enterprise Edition
Exchange 2003 1 (75GB max) 20
Exchange 2007 5 50
Exchange 2010 5 100

Table 1: Maximum databases per Exchange editions

However, the Exchange Server edition is not directly tied to the Windows Server edition:

  • For Exchange 2003 failover cluster mailbox servers, Exchange 2007 SCC/CCR environments [1], and  Exchange 2010 DAG environments, you need Windows Server Enterprise Edition in order to get the MSCS cluster component framework.
  • For Exchange 2003 servers running purely as bridgeheads or front-end servers, or Exchange 2007/2010 HT, CAS, ET, and UM servers, you only need Windows Server Standard Edition.

I’ve seen some discussion around the fact that Exchange 2010 will install on Windows Server 2008 Datacenter Edition and Windows Server 2008 R2 Datacenter Edition, even though it’s not supported there and is not listed in the Operating System requirements section of the TechNet documentation.

HOWEVER…if we look at the Prerequisites for Exchange 2010 Server section of the Exchange Server 2010 Licensing site, we now see that Datacenter edition is, in fact listed as shown in Figure 1:

Exchange 2010 server license comparison

Figure 1: Exchange 2010 server license comparison

This is pretty cool, and the appropriate TechNet documentation is in the process of being updated to reflect this. What this means is that you can deploy Exchange 2010 on Windows Server Datacenter Edition; the differences between editions of Windows Server 2008 R2 are found here.[2] If you take a quick scan through the various feature comparison charts in the sidebar, you might wonder why anyone would want to install Exchange 2010 on Windows Server Datacenter Edition; it’s more costly and seems to provide the same benefits. However, take a look at the technical specifications comparison; this is, I believe, the meat of the matter:

  • Both editions give you a maximum of 2 TB – more than you can realistically throw at Exchange 2010.
  • Enterprise Edition gives you support for a maximum eight (8) x64 CPU sockets, while Datacenter Edition gives you sixty-four (64). With quad-core CPUs, this means a total of 32 cores under Enterprise vs. 256 cores under Datacenter.
  • With the appropriate hardware, you can hot-add memory in Enterprise Edition. However, you can’t perform a hot-replace, nor can you hot-add or hot-replace CPUs under Enterprise. With Datacenter, you can hot-add and hot-remove both memory and CPUs.

These seem to be compelling in many scenarios at first glance, unless you’re familiar with the recommended maximum configurations for Exchange 2010 server sizing. IIRC, the maximum CPUs that are recommended for most Exchange 2010 server configurations (including multirole servers) would be 24 cores – which fits into the 8 socket limitation of Enterprise Edition while using quad core CPUs.

With both Intel and AMD now offering hexa-core (6 core) CPUs, you can move up to 48 cores in Enterprise Edition. This is more than enough for any practical deployment of Exchange Server 2010 I can think of at this time, unless future service packs drastically change the CPU performance factors. Both Enterprise and Datacenter give you a ceiling of 2TB of RAM, which is far greater than required by even the most aggressively gigantic mailbox load I’d want to place on a single server. I’m having a difficult time seeing how anyone could realistically build out an Exchange 2010 server that goes beyond the performance and scalability limits of Enterprise Edition in any meaningful way.

In fact, I can think of only three reasons someone would want to run Exchange 2010 on Windows Server Datacenter Edition:

  • You have spare Datacenter Edition licenses, aren’t going to use them, and don’t want to buy more Enterprise Edition licenses. This must be a tough place to be in, but it can happen under certain scenarios.
  • You have a very high server availability requirements and require the hot-add/hot-replace capabilities. This will get costly – the server hardware that supports this isn’t cheap – but if you need it, you need it.
  • You’re already running a big beefy box with Datacenter and virtualization[3]. The box has spare capacity, so you want to make use of it.

The first two make sense. The last one, though, I’d be somewhat leery of doing. Seriously, think about this – I’m spending money on monstrous hardware with awesome fault tolerance capabilities, I’ve forked over for an OS license[4] that gives me the right to unlimited virtual machines, and now I’m going to clutter up my disaster recovery operations by mixing Exchange and other applications (including virtualization) in the same host OS instance? That may be great for a lab environment, but I’d have a long conversation with any customer who wanted to do this under production. Seriously, just spin up a new VM, use Windows Server Enterprise Edition, and go to town. The loss of hardware configuration flexibility I get from going virtual is less than I gain by compartmentalizing my Exchange server to its own machine, along with the ability to move that virtual machine to any virtualization host I have.

So, there you have it: Exchange 2010 can now be run on Windows Server Datacenter Edition, which means yay! for options. But in the end, I don’t expect this to make a difference for any of the deployments I’m like to be working on. This is a great move for a small handful of customers who really need this.

[1] MSCS is not required for Exchange 2007 SCR, although manual target activation can be easier in some scenarios if your target is configured as a single passive node cluster.

[2] From what I can tell, the same specs seem to be valid for Windows Server 2008, with the caveat that Windows Server 2008 R2 doesn’t offer a 32-bit version so the chart doesn’t give that information. However, since Exchange 2010 is x64 only, this is a moot point.

[3] This is often an attractive option, since you can hosted an unlimited number of Windows Server virtual machines without having to buy further Windows Server licenses for them.

[4] Remember that Datacenter is not licensed at a flat cost per server like Enterprise is; it’s licensed per socket. The beefier the machine you run it on, the more you pay.

From Whence Redundancy? Exchange 2010 Storage Essays, part 1

Updated 4/13 with improved reseed time data provided by item #4 in the Top 10 Exchange Storage Myths blog post from the Exchange team.

Over the next couple of months, I’d like to slowly sketch out some of the thoughts and impressions that I’ve been gathering about Exchange 2010 storage over the last year or so and combine them with the specific insights that I’m gaining at my new job. In this inaugural post, I want to tackle what I have come to view as the fundamental question that will drive the heart of your Exchange 2010 storage strategy: will you use a RAID configuration or will you use a JBOD configuration?

In the interests of full disclosure, the company I work for now is a strong NetApp reseller, so of course my work environment is conducive to designing Exchange in ways that make it easy to sell the strengths of NetApp kit. However, part of the reason I picked this job is precisely because I agree with how they address Exchange storage and how I think the Exchange storage paradigm is going to shake out in the next 3-5 years as more people start deploying Exchange 2010.

In Exchange 2010, Microsoft re-designed the Exchange storage system to target what we can now consider to be the lowest common denominator of server storage: a directly attached storage (DAS) array of 7200 RPM SATA disks in a Just a Box of Disks (JBOD) configuration. This DAS/JBOD/SATA (what I will now call DJS) configuration has been an unworkable configuration for Exchange for almost its entire lifetime:

  • The DAS piece certainly worked for the initial versions of Exchange; that’s what almost all storage was back then. Big centralized SANs weren’t part of the commodity IT server world, reserved instead for the mainframe world. Server administrators managed server storage. The question was what kind of bus you used to attach the array to the server. However, as Exchange moved to clustering, it required some sort of shared storage. While a shared SCSI bus was possible, it not only felt like a hack, but also didn’t scale well beyond two nodes.
  • SATA, of course, wasn’t around back in 1996; you had either IDE or SCSI. SCSI was the serious server administrator’s choice, providing better I/O performance for server applications, as well as faster bus speeds. SATA, and its big brother SAS, both are derived from the lessons that years of SCSI deployments have provided. Even for Exchange 2007, though, SATA’s poor random I/O performance made it unsuitable for Exchange storage. You had to use either SAS or FC drives.
  • RAID has been a requirement for Exchange deployments, historically, for two reasons: to combine enough drive spindles together for acceptable I/O performance (back when disks were smaller than mailbox databases), and to ensure basic data redundancy. Redundancy was especially important once Exchange began supporting shared storage clustering and required both aggregate I/O performance only achievable with expensive disks and interfaces as well as the reduced chance of a storage failure being a single point of failure.

If you look at the marketing material for Exchange 2010, you would certainly be forgiven for thinking that DJS is the only smart way to deploy Exchange 2010, with SAN, RAID, and non-SATA systems supported only for those companies caught in the mire of legacy deployments. However, this isn’t at all true. There are a growing number of Exchange experts (and not just those of us who either work for storage vendors or resell their products) who think that while DJS is certainly an interesting option, it’s not one that’s a good match for every customer.

In order to understand why DJS is truly possible in Exchange 2010, and more importantly begin to understand where DJS configurations are a good fit and what underlying conditions and assumptions you need to meet in order to get the most value from DJS, we need to separate these three dimensions and discuss them separately.


While I will go into more detail on all three dimensions at later date, I want to focus on the JBOD vs.. RAID question now. If you need some summaries, then check out fellow Exchange MVP (and NetApp consultant) John Fullbright’s post on the economics of DAS vs. SAN as well as Microsoft’s Matt Gossage and his TechEd 2009 session on Exchange 2010 storage. Although there are good arguments for diving into drive technology or storage connection debates, I’ve come to believe that the central philosophy question you must answer in your Exchange 2010 design is at what level you will keep your data redundant. Until Exchange 2007, you had only one option: keeping your data redundant at the disk controller level. Using RAID technologies, you had two copies of your data[1]. Because you had a second copy of the data, shared storage clustering solutions could be used to provide availability for the mailbox service.

With Exchange 2007’s continuous replication features, you could add in data redundancy at the application level and avoid the dependency of shared storage; CCR creates two copies, and SCR can be used to create one or more additional copies off-site. However, given the realities of Exchange storage, for all but the smallest deployments, you had to use RAID to provide the required number of disk spindles for performance. With CCR, this really meant you were creating four copies; with SCR, you were creating an additional two copies for each target replica you created.

This is where Exchange 2010 throws a wrench into the works. By virtue of a re-architected storage engine, it’s possible under specific circumstances to design a mailbox database that will fit on a single drive while still providing acceptable performance. The reworked continuous replication options, now simplified into the DAG functionality, create additional copies on the application level. If you hit that sweet spot of the 1:1 database to disk ratio, then you only have a single copy of the data per replica and can get an n-1 level of redundancy, where n is the number of replicas you have. This is clearly far more efficient for disk usage…or is it? The full answer is complex, the simple answer is, “In some cases.”

In order to get the 1:1 database to disk ratio, you have to follow several guidelines:

  1. Have at least three replicas of the database in the DAG, regardless of which sites they are in. Doing so allows you to place both the EDB and transaction log files on the same physical drive, rather than separating them as you did in previous versions of Exchange.
  2. Ensure that you have at least two replicas per site. The reason for this is that unlike Exchange 2007, you can reseed a failed replica from another passive copy. This allows you to avoid reseeding over your WAN, which is something you do not want to do.
  3. Size your mailbox databases to include no more users than will fit in the drive’s performance envelope. Although Exchange 2010 converts many of the random I/O patterns to sequential, giving better performance, not all has been converted, so you still have to plan against the random I/O specs.
  4. Ensure that write transactions can get written successfully to disk. Use a battery-backed caching controller for your storage array to ensure the best possible performance from the disks. Use write caching for the physical disks, which means ensuring each server hosting a replica has a UPS.

At this point, you probably have disk capacity to spare, which is why Exchange 2010 allows the creation of archive mailboxes in the same mailbox database. All of the user’s data is kept at the same level of redundancy, and the archived data – which is less frequently accessed than the mainline data – is stored without additional significant disk or I/O penalty. This all seems to indicate that JBOD is the way to go, yes? Two copies in the main site, two off-site DR copies, and I’m using cheaper storage with larger mailboxes and only four copies of my data instead of the minimum of six I’d have with CCR+SCR (or the equivalent DAG setup) on RAID configurations.

Not so fast. Microsoft’s claims around DJS configurations usually talk about the up-front capital expenditures. There’s more to a solid design than just the up-front storage price tag, and even if the DJS solution does provide savings in your situation, that is only the start. You also need to think about the lifetime of your storage and all the operational costs. For instance, what happens when one of those 1:1 drives fails?

Well, if you bought a really cheap DAS array, your first indication will be when Exchange starts throwing errors and the active copy moves to one of the other replicas. (You are monitoring your Exchange servers, right?) More expensive DAS arrays usually directly let you know that a disk failed. Either way, you have to replace the disk. Again, with a cheap white-box array, you’re on your own to buy replacement disks, while a good DAS vendor will provide replacements within the warranty/maintenance period. Once the disk is replaced, you have to re-establish the database replica. This brings us to the wonderful manual process known as database reseeding, which is not only a manual task, but can take quite a significant amount of time – especially if you made use of archival mailboxes and stuffed that DJS configuration full of data. Let’s take a closer look at what this means to you.

[Begin 4/13 update]

There’s a dearth of hard information out there about what types of reseed throughputs we can achieve in the real world, and my initial version of this post where I assumed 20GB/hour as an “educated guess” earned me a bit of ribbing in some quarters. In my initial example, I said that if we can reseed 20GB of data per hour (from a local passive copy to avoid the I/O hit to the active copy), that’s 10 hours for a 200GB database, 30 hours for a 600GB database, or 60 hours –two and a half days! – for a 1.2 TB database[2].

According to the Top 10 Exchange Storage Myths post on the Exchange team blog, 20GB/hour is way too low; in their internal deployments, they’re seeing between 35-70GB per hour. How would these speeds affect reseed times in my examples above? Well, let’s look at Table 1:

Table 1: Example Exchange 2010 Mailbox Database reseed times

Database Size Reseed Throughput Reseed Time
200GB 20GB/hr 10 hours
200GB 35GB/hr 7 hours
200GB 50GB/hr 4 hours
200GB 70GB/hr 3 hours
600GB 20GB/hr 30 hours
600GB 35GB/hr 18 hours
600GB 50GB per hour 12 hours
600GB 70GB per hour 9 hours
1.2TB 20GB/hr 60 hours
1.2TB 35GB/hr 35 hours
1.2TB 50GB/hr 24 hours
1.2TB 70GB/hr 18 hours

As you can see, reseed time can be a key variable in a DJS design. In some cases, depending on your business needs, these times could make or break whether this is a good design. I’ve done some talking around and found out that reseed times in the field are all over the charts. I had several people talk to me at the MVP Summit and ask me under what conditions I’d seen 20GB/hour, as that was too high. Astrid McClean and Matt Gossage of Microsoft had a great discussion with me and obviously felt that 20GB/hour is way too low.

Since then, I’ve received a lot of feedback and like I said, it’s all over the map. However, I’ve yet to hear anyone outside of Microsoft publicly state a reseed throughput higher than 20GB/hour. What this says to me is that getting the proper network design in place to support a good reseed rate hasn’t been a big point in deployments so far, and that in order to make a DJS design work, this may need to be an additional consideration.

If your replication network is designed to handle the amount of traffic required for normal DAG replication and doesn’t have sufficient throughput to handle reseed operations, you may be hurting yourself in the unlikely event of suffering multiple simultaneous replica failures on the same mailbox database.

This is a bigger concern for shops that have a small tolerance for any given drive failure. In most environments, one of the unspoken effects of a DJS DAG design is that you are trading number of replicas – and database-level failover – for replica rebuild time. If you’re reduced from four replicas down to three, or three down to two during the time it takes to detect the disk failure, replace the disk, and complete the reseed, you’ll probably be okay with that taking a longer period of time as long as you have sufficient replicas.

All during the reseed time, you have one fewer replica of that database to protect you. If your business processes and requirements don’t give you that amount of leeway, you either have to design smaller databases (and waste the disk capacity, which brings us right back to the good old bad days of Exchange 2000/2003 storage design) or use RAID.

[End 4/13 update]

Now, with a RAID solution, we don’t have that same problem. We still have a RAID volume rebuild penalty, but that’s happening inside the disk shelf at the controller, not across our network between Exchange servers. And with a well-designed RAID solution such as generic RAID 10 (1+0) or NetApp’s RAID-DP, you can actually survive the loss of more disks at the same time. Plus, a RAID solution gives me the flexibility to populate my databases with smaller or larger mailboxes as I need, and aggregate out the capacity and performance across my disks and databases. Sure, I don’t get that nice 1:1 disk to database ratio, but I have a lot more administrative flexibility and can survive disk loss without automatically having to begin the reseed dance.

Don’t get me wrong – I’m wildly enthusiastic that I as an Exchange architect have the option of designing to JBOD configurations. I like having choices, because that helps me make the right decisions to meet my customers’ needs. And that, in the end, is the point of a well-designed Exchange deployment – to meet your needs. Not the needs of Microsoft, and not the needs of your storage or server vendors. While I’m fairly confident that starting with a default NetApp storage solution is the right choice for many of the environments I’ll be facing, I also know how to ask the questions that lead me to consider DJS instead. There’s still a place for RAID at the Exchange storage table.

In further installments over the next few months, I’ll begin to address the SATA vs. SAS/FC and DAS vs. SAN arguments as well. I’ll then try to wrap it up with a practical and realistic set of design examples that pull all the pieces together.

[1] RAID-1 (mirroring) and RAID-10 (striping and mirroring) both create two physical copies of the data. RAID-5 does not, but it allows the loss of a single drive failure — effectively giving you a virtual second copy of the data.

[2] Curious why picked these database sizes?  200GB is the recommended maximum size for Exchange 2007 (due to backup limitations), and 600GB/1.2TB are the realistic recommended maximums you can get from 1TB and 2TB disks today in a DJS replica-per-disk deployment; you need to leave room for the content index, transaction logs, and free space.

Busting the Exchange Trusted Subsystem Myth

It’s amazing what kind of disruption leaving your job, looking for a new job, and starting to get settled in to a new job can have on your routines. Like blogging. Who knew?

At any rate, I’m back with some cool Exchange blogging. I’ve been getting a chance to dive into a “All-Devin, All-Exchange, All The Time” groove and it’s been a lot of fun, some of the details of which I hope to be able to share with you in upcoming months. In the process, I’ve been building a brand new Exchange 2010 lab environment and ran smack into a myth that seems to be making the rounds among people who are deploying Exchange 2010. This myth gives bum advice for those of you who are deploying an Exchange 2010 DAG and not using an Exchange 2010 Hub Transport as your File Share Witness (FSW). I call it the Exchange Trusted Subsystem Myth, and the first hint of it I see seems to be on this blog post. However, that same advice seems to have gotten around the net, as evidenced by this almost word-for-word copy or this posting that links to the first one. Like many myths, this one is pernicious not because it’s completely wrong, but because it works even though it’s wrong.

If you follow the Exchange product group’s deployment assumptions, you’ll never run into the circumstance this myth addresses; the FSW is placed on an Exchange 2010 HT role in the organization. Although you can specify the FSW location (server and directory) or let Exchange pick a server and directory or you, the FSW share isn’t created during the configuration of the DAG (as documented by fellow Exchange MVP Elan Shudnow and the “Witness Server Requirements” section of the Planning for High Availability and Site Resilience TechNet topic). Since it’s being created on an Exchange server as the second member of the DAG is joined, Exchange has all the permissions it needs on the system to create the share. If you elect to put the share on a non-Exchange server, then Exchange doesn’t have permissions to do it. Hence the myth:

  1. Add the FSW server’s machine account to the Exchange Trusted Subsystem group.
  2. Add the Exchange Trusted Subsystem group to the FSW server’s local Administrators group.

The sad part is, only the second action is necessary. True, doing the above will make the FSW work, but it will also open a much wider hole in your security than you need or want. Let me show you from my shiny new lab! In this configuration, I have three Exchange systems: EX10MB01, EX10MB02, and EX10MB03. All three systems have the Mailbox, Client Access, and Hub Transport roles. Because of this, I want to put the FSW on a separate machine. I could have used a generic member server, but I specifically wanted to debunk the myth, so I picked my DC EX10DC01 with malice aforethought.

  • In Figure 1, I show adding the Exchange Trusted Subsystem group to the Builtin/Administrators group on EX10DC01. If this weren’t a domain controller, I could add it to the local Administrators group instead, but DCs require tinkering. [1]

Figure 1: Membership of the Builtin/Administrators group on EX10DC01

  • In Figure 2, I show the membership of the Builtin/Administrators group on EX10DC01. No funny business up my sleeve!

Figure 2: Membership of the Exchange Trusted Subsystem group

  • I now create the DAG object, specifying EX10DC01 as my FSW server and the C:\EX10DAG01 directory so we can see if it ever gets created (and when).
  • In Figure 3, I show the root of the C:\ drive on EX10DC01 after adding the second Exchange 2010 server to the DAG. Now, the directory and share are created, without requiring the server’s machine account to be added to the Exchange Trusted Subsystem group.

Figure 3: The FSW created on EX10DC01

I suspect that this bad advice came about through a combination of circumstances, including an improper understanding of Exchange caching of Active Directory information and when the FSW is actually created. However it came about, though, it needs to be stopped, because any administrator that configures their Exchange organization is opening a big fat hole in the Exchange security model.

So, why is adding the machine account to the Exchange Trusted Subsystem group a security hole? The answer lies in Exchange 2010’s shift to Role Based Access Control (RBAC). In previous versions of Exchange, you delegated permissions directly to Active Directory and Exchange objects, allowing users to perform actions directly from their security context. If they had the appropriate permissions, their actions succeeded.

In Exchange 2010 RBAC, this model goes away; you now delegate permissions by telling RBAC what options given groups, policies, or users can perform, then assigning group memberships or policies as needed. When the EMS cmdlets run, they do so as the local machine account; since the local machine is an Exchange 2010 server, this account has been added to the Exchange Trusted Subsystem group. This group has been delegated the appropriate access entries in Active Directory and Exchange databases objects, as described in the Understanding Split Permissions TechNet topic. For a comprehensive overview of RBAC and how all the pieces fit together, read the Understanding Role Based Access Control TechNet topic.

By improperly adding a non-Exchange server to this group, you’re now giving that server account the ability to read and change any Exchange-related object or property in Active Directory or Exchange databases. Obviously, this is a hole, especially given the relative ease with which one local administrator can get a command line prompt running as one of the local system accounts.

So please, do us all a favor: if you ever hear or see someone passing around this myth, please, link them here.


[1] Yes, it is also granting much broader permissions than necessary to make a DC the FSW node. Now the Exchange Trusted Subsystem group is a member of the Domain Admins group. This is probably not what you want to do, so really, don’t do this outside of a demo lab.

Why Aren’t My Exchange Certificates Validating?

Updated 10/13: Updated the link to the blog article on configuring Squid for Exchange per the request of the author Owen Campbell. Thank you, Owen, for letting me know the location had changed!

By now you should be aware that Microsoft strongly recommends that you publish Exchange 2010/2007 client access servers (and Exchange 2003/2000 front-end servers) to the Internet through a reverse proxy like Microsoft’s Internet Security and Acceleration Server 2006 SP1 (ISA) or the still-in-beta Microsoft Forefront Threat Management Gateway (TMG). There are other reverse proxy products out there, such as the open source Squid (with some helpful guides on how to configure it for EAS, OWA, and Outlook Anywhere), but many of them can only be used to proxy the HTTP-based protocols (for example, the reverse proxy module for the Apache web server) and won’t handle the RPC component of Outlook Anywhere.

When you’re following this recommendation, you keep your Exchange CAS/HT/front-end servers in your private network and place the ISA Server (or other reverse proxy solution) in your perimeter (DMZ) network. In addition to ensuring that your reverse proxy is scrubbing incoming traffic for you, you can also gain another benefit: SSL bridging. SSL bridging is where there are two SSL connections – one between the client machine and the reverse proxy, and a separate connection (often using a different SSL certificate) between the reverse proxy and the Exchange CAS/front-end server. SSL bridging is awesome because it allows you radically reduce the number of commercial SSL certificates you need to buy. You can use Windows Certificate Services to generate and issue certificates to all of your internal Exchange servers, creating them with all of the Subject Alternate Names that you need and desire, and still have a commercial certificate deployed on your Internet-facing system (nice to avoid certificate issues when you’re dealing with home systems, public kiosks, and mobile devices, no?) that has just the public common namespaces like autodiscover.yourdomain.tld and mail.yourdomain.tld (or whatever you actually use).

In the rest of this article, I’ll be focusing on ISA because, well, I don’t know Squid that well and haven’t actually seen it in use to publish Exchange in a customer environment. Write what you know, right?

One of the most irritating experiences I’ve consistently had when using ISA to publish Exchange securely is getting the certificate configuration on ISA correct. If you all want, I can cover certificate namespaces in another post, because that’s not what I’m talking about – I actually find that relatively easy to deal with these days. No, what I find annoying about ISA and certificates is getting all of the proper root CA certificates and intermediate CA certificates in place. The process you have to go through varies on who you buy your certificates from. There are a couple, like GoDaddy, that offer inexpensive certificates that do exactly what Exchange needs for a decent price – but they require an extra bit of configuration to get everything working.

The problem you’ll see is two-fold:

  1. External clients will not be able to connect to Exchange services. This will be inconsistent; some browsers and some Outlook installations (especially those on new Windows installs or well-updated Windows installs) will work fine, while others won’t. You may have big headaches getting mobile devices to work, and the error messages will be cryptic and unhelpful.
  2. While validating your Exchange publishing rules with the Exchange Remote Connectivity Analyzer (ExRCA), you get a validation error on your certificate as shown in Figure 1.

ExRCA can't find the intermediate certificate on your ISA server
Figure 1: Missing intermediate CA certificate validation error in ExRCA

The problem is that some devices don’t have the proper certificate chain in place. Commercial certificates typically have two or three certificates in their signing chain: the root CA certificate, an intermediate CA certificate, and (optionally) an additional intermediate CA certificate. The secondary intermediate CA certificate is typically the source of the problem; it’s configured as a cross-signing certificate, which is intended to help CAs transition old certificates from one CA to another without invalidating the issued certificates. If your certificate was issued by a CA that has these in place, you have to have both intermediate CA certificates in place on your ISA server in the correct certificate stores.

By default, CAs will issue the entire certificate chain to you in a single bundle when they issue your cert. You have to import this bundle on the machine you issued the request from or else you don’t get the private key associated with the certificate. Once you’ve done that, you need to re-export the certificate, with the private key and its entire certificate chain, so that you can import it in ISA. This is important because ISA needs the private key so it can decrypt the SSL session (required for bridging), and ISA needs all the certificate signing chain so that it can hand out missing intermediate certificates to devices that don’t have them (such as Windows Mobile devices that have the root CA certificates). If the device doesn’t have the right intermediates, can’t download it itself (like Internet Explorer can), and can’t get it from ISA, you’ll get the certificate validation errors.

Here’s what you need to do to fix it:

  • Ensure that your server certificate has been exported with the private key and *all* necessary intermediate and root CA certificates.
  • Import this certificate bundle into your ISA servers. Before you do this, check the computer account’s personal certificate store and make sure any root or intermediate certificates that got accidentally imported there are deleted.
  • Using the Certificate MMC snap-in, validate that the certificate now shows as valid when browsing the certificate on your ISA server, as shown in Figure 2.

Even though the Certificates MMC snap-in shows this certificate as valid, ISA won't serve it out until the ISA Firewall Service is restarted!
Figure 2: A validated server certificate signing chain on ISA Server

  • IMPORTANT STEP: restart the ISA Firewall Service on your ISA server (if you’re using an array, you have to do this on each member; you’ll want to drain the connections before restarting, so it can take a while to complete). Even though the Certificate MMC snap-in validates the certificate, the ISA Firewall only picks up the changes to the certificate chain on startup. This is annoying and stupid and has caused me pain in the past – most recently, with 3Sharp’s own Exchange 2010 deployment (thanks to co-worker and all around swell guy Tim Robichaux for telling me how to get ISA to behave).

Also note that many of the commercial CAs specifically provide downloadable packages of their root CA and intermediate CA certificates. Some of them get really confusing – they have different CAs for different tiers or product lines, so you have to match the server certificate you have with the right CA certificates. GoDaddy’s CA certificate page can be found here.

Some Thoughts on FBA (part 2)

As promised, here’s part 2 of my FBA discussion, in which we’ll talk about the interaction of ISA’s forms-based authentication (FBA) feature with Exchange 2010. (See part 1 here.)

Offloading FBA to ISA

As I discussed in part 1, ISA Server includes the option of performing FBA pre-authentication as part of the web listener. You aren’t stuck with FBA – you can use other pre-auth methods too. The thinking behind this is that ISA is the security server sitting in the DMZ, while the Exchange CAS is in the protected network. Why proxy an incoming connection from the Internet into the real world (even with ISA’s impressive HTTP reverse proxy and screening functionality) if it doesn’t present valid credentials? In this configuration, ISA is configured for FBA while the Exchange 2010/2007 CAS or Exchange 2003 front-end server are configured for Windows Integrated or Basic as shown in Figure 1 (a figure so nice I’ll re-use it):

Publishing Exchange using FBA on ISA

Figure 1: Publishing Exchange using FBA on ISA

Moving FBA off of ISA

Having ISA (and Threat Management Gateway, the 64-bit successor to ISA 2006) perform pre-auth in this fashion is nice and works cleanly. However, in our Exchange 2010 deployment, we found a couple of problems with it:

The early beta releases of Entourage for EWS wouldn’t work with this configuration; Entourage could never connect. If our users connected to the 3Sharp VPN, bypassing the ISA publishing rules, Entourage would immediately see the Exchange 2010 servers and do its thing. I don’t know if the problem was solved for the final release.

We couldn’t get federated calendar sharing, a new Exchange 2010 feature, to work. Other Exchange 20120 organizations would get errors when trying to connect to our organization. This new calendar sharing feature uses a Windows Live-based central brokering service to avoid the need to provision and manage credentials.

Through some detailed troubleshooting with Microsoft and other Exchange 2010 organizations, we finally figured out that our ISA FBA configuration was causing the problem. The solution was to disable ISA pre-authentication and re-enable FBA on the appropriate virtual directories (OWA and ECP) on our CAS server. Once we did that, not only did federated calendar sharing start working flawlessly, but our Entourage users found their problems had gone away too. For more details of what we did, read on.

How Calendar Sharing Works in Exchange 2010

If you haven’t seen other descriptions of the federated calendar sharing, here’s a quick primer on how it works. This will help you understand why, if you’re using ISA pre-auth for your Exchange servers, you’ll want to rethink it.

In Exchange 2007, you could share calendar data with other Exchange 2007 organizations. Doing so meant that your CAS servers had to talk to their calendar servers, and the controls around it were not that granular. In order to do it, you either needed to establish a forest trust and grant permissions to the other forest’s CAS servers (to get detailed per-user free/busy information) or set up a separate user in your forest for the foreign forests to use (to get default per-org free/busy data). You also have to fiddle around with the Autodiscover service connection points and ensure that you’ve got pointers for the foreign Autodiscover SCPs in your own AD (and the foreign systems have yours). You also have to publish Autodiscover and EWS externally (which you have to do for Outlook Anywhere) and coordinate all your certificate CAs. While this doesn’t sound that bad, you have to do these steps for every single foreign organization you’re sharing with. That adds up, and it’s a poorly documented process – you’ll start at this TechNet topic about the Availability service and have to do a lot of chasing around to figure out how certificates fit in, how to troubleshoot it, and the SCP export and import process.

In Exchange 2010, this gets a lot easier; individual users can send sharing invitations to users in other Exchange 2010 organizations, and you can set up organization relationships with other Exchange 2010 organizations. Microsoft has broken up the process into three pieces:

  1. Establish your organization’s trust relationship with Windows Live. This is a one-time process that must take place before any sharing can take place – and you don’t have to create or manage any service or role accounts. You just have to make sure that you’re using a CA to publish Autodiscover/EWS that Windows Live will trust. (Sorry, there’s no list out there yet, but keep watching the docs on TechNet.) From your Exchange 2010 organization (typically through EMC, although you can do it from EMS) you’ll swap public keys (which are built into your certificates) with Windows Live and identify one or more accepted domains that you will allow to be federated. Needless to say, Autodiscover and EWS must be properly published to the Internet. You also have to add a single DNS record to your public DNS zone, showing that you do have authority over the domain namespace. If you have multiple domains and only specify some of them, beware: users that don’t have provisioned addresses in those specified domains won’t be able to share or receive federated calendar info!
  2. Establish one or more sharing policies. These policies control how much information your users will be able to share with external users through sharing invitations. The setting you pick here defines the maximum level of information that your users can share from their calendars: none, free/busy only, some details, or all details. You can create a single policy for all your users or use multiple policies to provision your users on a more granular basis. You can assign these policies on a per-user basis.
  3. Establish one or more sharing relationships with other organizations. When you want to view availability data of users in other Exchange 2010 organizations, you create an organization relationship with them. Again, you can do this via EMC or EMS. This tells your CAS servers to lookup information from the defined namespaces on behalf of your users – contingent, of course, that the foreign organization has established the appropriate permissions in their organization relationships. If the foreign namespace isn’t federated with Windows Live, then you won’t be allowed to establish the relationship.

You can read more about these steps in the TechNet documentation and at this TechNet topic (although since TechNet is still in beta, it’s not all in place yet). You should also know that these policies and settings combine with the ACLs on users calendar folders, and as is the typical case in Exchange when there are multiple levels of permission, the most restrictive level wins.

What’s magic about all of this is that, at no point along the way other than the initial first step, do you have to worry consciously about the certificates you’re using. You never have to provide or provision credentials. As you create your policies and sharing relationships with other organizations – and other organizations create them with yours – Windows Live is hovering silently in the background, acting as a trusted broker for the initial connections. When your Exchange 2010 organization interacts with another, your CAS servers receive a SAML token from Windows Live. This token is then passed to the foreign Exchange 2010 organization, which can validate it because of its own trust relationship with Windows Live. All this token does is validate that your servers are really coming from the claimed namespace – Windows Live plays no part in authorization, retrieving the data, or managing the sharing policies.

However, here’s the problem: when my CAS talks to your CAS, they’re using SAML tokens – not user accounts – to authenticate against IIS for EWS calls. ISA Server (and, IIRC, TMG) don’t know how to validate these tokens, so the incoming requests can’t authenticate and pass on to the CAS. The end result is that you can’t get a proper sharing relationship set up and you can’t federate calendar data.

What We Did To Fix It

Once we knew what the problem was, fixing it was easy:

  1. Modify the OWA and ECP virtual directors on all of our Exchange 2010 CAS servers to perform FBA. These are the only virtual directories that permit FBA, so they’re the only two you need to change:Set-OWAVirtualDirectory -Identity “CAS-SERVER\owa (Default Web Site)” -BasicAuthentication $TRUE -WindowsAuthentication $FALSE -FormsAuthentication $TRUESet-ECPVirtualDirectory -Identity “CAS-SERVER\ecp (Default Web Site)” -BasicAuthentication $TRUE -WindowsAuthentication $FALSE -FormsAuthentication $TRUE
  2. Modify the Web listener on our ISA server to disable pre-authentication. In our case, we were using a single Web listener for Exchange (and only for Exchange), so it was a simple matter of changing the authentication setting to a value of No Authentication.
  3. Modify each of the ISA publishing rules (ActiveSync, Outlook Anywhere, and OWA):On the Authentication tab, select the value No delegation, but client may authenticate directly.On the Users tab, remove the value All Authenticated Users and replace it with the value All Users. This is important! If you don’t do this, ISA won’t pass any connections on!

You may also need to take a look at the rest of your Exchange virtual directories and ensure that the authentication settings are valid; many places will allow Basic authentication between ISA and their CAS servers and require NTLM or Windows Integrated from external clients to ISA.

Calendar sharing and ISA FBA pre-authentication are both wonderful features, and I’m a bit sad that they don’t play well together. I hope that future updates to TMG will resolve this issue and allow TMG to successfully pre-authenticate incoming federated calendar requests.

Stolen Thunder: Outlook for the Mac

I was going to write up a quick post about the release of Entourage for EWS (allowing it to work in native Exchange 2007, and, more importantly, Exchange 2010 environments) and the announcement that Office 2010 for the Mac would have Outlook, not Entourage, but Paul beat me to it, including my whole take on the thing. So go read his.

For those keeping track at home, yes, I still owe you a second post on the Exchange 2010 calendar sharing. I’m working on it! Soon!

EAS: King of Sync?

Seven months or so ago, IBM surprised a bunch of people by announcing that they were licensing Microsoft’s Exchange ActiveSync protocol (EAS) for use with a future version of Lotus Notes. I’m sure there were a few folks who saw it coming, but I cheerfully admit that I was not one of them. After about 30 seconds of thought, though, I realized that it made all kinds of sense. EAS is a well-designed protocol, I am told by my developer friends, and I can certainly attest to the relative lightweight load it puts on Exchange servers as compared to some of the popular alternatives – enough so that BlackBerry add-ons that speak EAS have become a not-unheard of alternative for many organizations.

So, imagine my surprise when my Linux geek friend Nick told me smugly that he now had a new Palm Pre and was synching it to his Linux-based email system using the Pre’s EAS support. “Oh?” said I, trying to stay casual as I was mentally envisioning the screwed-up mail forwarding schemes he’d put in place to route his email to an Exchange server somewhere. “Did you finally break down and migrate your email to an Exchange system? If not, how’d you do that?”

Nick then proceeded to point me in the direction of Z-Push, which is an elegant little open source PHP-based implementation of EAS. A few minutes of poking around and I became convinced that this was a wicked cool project. I really like how Z-Push is designed:

  • The core PHP module answers incoming requests for the http://server/Microsoft-Server-ActiveSync virtual directory and handles all the protocol-level interactions. I haven’t dug into this deeply, but although it appears it was developed against Apache, folks have managed to get it working on a variety of web servers, including IIS! I’m not clear on whether authentication is handled by the package itself or by the web server. Now that I think about it, I suspect it just proxies your provided credentials on to the appropriate back-end system so that you don’t have to worry about integrating Z-Push with your authentication sources.
  • One or more back-end modules (also written in PHP), which read and write data from various data sources such as your IMAP server, a Maildir file system, or some other source of mail, calendar, or contact information. These back-end modules are run through a differential engine to help cut down on the amount of synching the back-end modules must perform. It looks like the API for these modules is very well thought-out; they obviously want developers to be able to easily write backends to tie in to a wide variety of data sources. You can mix and match multiple backends; for example, get your contact data from one system, your calendar from another, and your email from yet a third system.
  • If you’re running the Zarafa mail server, there’s a separate component that handles all types of data directly from Zarafa, easing your configuration. (Hey – Zarafa and Z-Push…I wonder if Zarafa provides developer resources; if so, way to go, guys!)

You do need to be careful about the back-end modules; because they’re PHP code running on your web server, poor design or bugs can slam your web server. For example, there’s currently a bug in how the IMAP back-end re-scans messages, and the resulting load can create a noticeable impact on an otherwise healthy Apache server with just a handful of users. It’s a good thing that there seems to be a lively and knowledgeable community on the Z-Push forums; they haven’t wasted any time in diagnosing the bug and providing suggested fixes.

Very deeply cool – folks are using Z-Push to provide, for example, an EAS connection point on their Windows Home Server, synching to their Gmail account. I wonder how long it will take for Linux-based “Exchange killers” (other than Zarafa) to wrap this product into their overall packages.

It’s products like this that help reinforce the awareness that EAS – and indirectly, Exchange – are a dominant enough force in the email market to make the viability of this kind of project not only potentially useful, but viable as an open source project.

Comparing PowerShell Switch Parameters with Boolean Parameters

If you’ve ever take a look at the help output (or TechNet documentation) for PowerShell cmdlets, you see that they list several pieces of information about each of the various parameters the cmdlet can use:

  • The parameter name
  • Whether it is a required or optional parameter
  • The .NET variable type the parameter expects
  • A description of the behavior the parameter controls

Let’s focus on two particular types of parameters, the Switch (System.Management.Automation.SwitchParameter) and the Boolean (System.Boolean). While I never really thought about it much before reading a discussion on an email list earlier, these two parameter types seem to be two ways of doing the same thing. Let me give you a practical example from the Exchange 2007 Management Shell: the New-ExchangeCertificate cmdlet. Table 1 lists an excerpt of its parameter list from the current TechNet article:

Table 1: Selected parameters of the New-ExchangeCertificate cmdlet

Parameter Description



Use this parameter to specify the type of certificate object to create.

By default, this parameter will create a self-signed certificate in the local computer certificate store.

To create a certificate request for a PKI certificate (PKCS #10) in the local request store, set this parameter to $True.


Use this parameter to specify whether the resulting certificate will have an exportable private key.

By default, all certificate requests and certificates created by this cmdlet will not allow the private key to be exported.

You must understand that if you cannot export the private key, the certificate itself cannot be exported and imported.

Set this parameter to $true to allow private key exporting from the resulting certificate.

On quick examination, both parameters control either/or behavior. So why the two different types? The mailing list discussion I referenced earlier pointed out the difference:

Boolean parameters control properties on the objects manipulated by the cmdlets. Switch parameters control behavior of the cmdlets themselves.

So in our example, a digital certificate has a property as part of the certificate that marks whether the associated private key can be exported in the future. That property goes along with the certificate, independent of the management interface or tool used. For that property, then, PowerShell uses the Boolean type for the -PrivateKeyExportable property.

On the other hand, the –GenerateRequest parameter controls the behavior of the cmdlet. With this property specified, the cmdlet creates a certificate request with all of the specified properties. If this parameter isn’t present, the cmdlet creates a self-signed certificate with all of the specified properties. The resulting object (CSR or certificate) has no corresponding sign of what option was chosen – you could just as easily submit that CSR to another tool on the same machine to create a self-signed certificate.

I hope this helps draw the distinction. Granted, it’s one I hadn’t thought much about before today, but now that I have, it’s nice to know that there’s yet another sign of intelligence and forethought in the PowerShell architecture.

Some Thoughts on FBA (part 1)

It’s funny how topics tend to come in clumps. Take the current example: forms-based authentication (FBA) in Exchange.

An FBA Overview

FBA was introduced in Exchange Server 2003 as a new authentication method for Outlook Web Access. It requires OWA to be published using SSL – which was not yet common practice at that point in time – and in turn allowed credentials to be sent a single time using plain-text form fields. It’s taken a while for people to get used to, but FBA has definitely become an accepted practice for Exchange deployments, and it’s a popular way to publish OWA for Exchange 2003, Exchange 2007, and the forthcoming Exchange 2010.

In fact, FBA is so successful, that the ISA Server group got into the mix by including FBA pre-authentication for ISA Server. With this model, instead of configuring Exchange for FBA you instead configure your ISA server to present the FBA screen. Once the user logs in, ISA takes the credentials and submits them to the Exchange 2003 front-end server or Exchange 2007 (or 2010) Client Access Server using the appropriately configured authentication method (Windows Integrated or Basic). In Exchange 2007 and 2010, this allows each separate virtual directory (OWA, Exchange ActiveSync, RPC proxy, Exchange Web Services, Autodiscover, Unified Messaging, and the new Exchange 2010 Exchange Control Panel) to have its own authentication settings, while ISA server transparently mediates them for remote users. Plus, ISA pre-authenticates those connections – only connections with valid credentials ever get passed on to your squishy Exchange servers – as shown in Figure 1:

Publishing Exchange using FBA on ISA

Figure 1: Publishing Exchange using FBA on ISA

Now that you know more about how FBA, Exchange, and ISA can interact, let me show you one mondo cool thing today. In a later post, we’ll have an architectural discussion for your future Exchange 2010 deployments.

The Cool Thing: Kay Sellenrode’s FBA Editor

On Exchange servers, it is possible to modify both the OWA themes and the FBA page (although you should check about the supportability of doing so). Likewise, it is also possible to modify the FBA page on ISA Server 2006. This is a nice feature as it helps companies integrate the OWA experience into the overall look and feel of the rest of their Web presence. Making these changes on Exchange servers is a somewhat well-documented process. Doing them on ISA is a bit more arcane.

Fellow Exchange 2007 MCM Kay Sellenrode has produced a free tool to simplify the process of modifying the ISA 2006 FBA – named, aptly enough, the FBA Editor. You can find the tool, as well as a YouTube video demo of how to use it, from his blog. While I’ve not had the opportunity to modify the ISA FBA form myself, I’ve heard plenty of horror stories about doing so – and Kay’s tool is a very cool, useful community contribution.

In the next day or two (edit: or more), we’ll move on to part 2 of our FBA discussion – deciding when and where you might want to use ISA’s FBA instead of Exchange’s.

You, too, can Master Exchange

One of the biggest criticisms I’ve seen of the MCM program, even when it first was announced, was the cost – at a list price of $18,500 for the actual MCM program, discounting the travel, lodging, food, and opportunity cost of lost revenue, a lot of people are firmly convinced that the program is way too expensive for anybody but the bigger shops.

This discussion has of course gone back and forth within the Exchange community. I think part of the pushback comes from the fact that MCM is the next evolution of the Exchange Ranger program, which felt very elitist and exclusive (and by many accounts was originally designed to be, back when it was only a Microsoft-only evolution designed to provide a higher degree of training for Microsoft consultants and engineers to better resolve their own customer issues). Starting off with that kind of background leaves a lot of lingering impressions, and the Exchange community has long memories. Paul has a great discussion of his point of view as a new MCM instructor and shares his take on the “is it worth it?” question.

Another reason for pushback is the economy. The typical argument is, “I can’t afford to take this time right now.” Let’s take a ballpark figure here, aimed at the coming May 4 rotation, just to have some idea of the kinds of numbers folks are thinking about:

  • Imagine a consultant working a 40-hour week. Her bosses would like her to meet 90% (36 hours) billable. Given two weeks of vacation a year, that 50 weeks at 36 hours a week.
  • We’ll also imagine that she’s able to bill out at $100/hour. This brings her minimum annual revenue to $180,000. They set her opportunity cost (lost revenue) at $3,600/week.
  • We’ll assume she have the pre-requisites nailed (MCITP Enterprise Messaging, the additional AD exam for either Windows 2003 or Windows 2008, and the field experience). No extra cost there (otherwise it’s $150/test, or $600 total).
  • Let’s say her plane tickets are $700 for round-trip to Redmond and back.
  • And we’ll say that she needs to stay at a hotel, checking in Sunday May 3rd, checking out Sunday May 24th, at a daily rate of $200.
  • Let’s also assume she’ll need $75 a day for meals.

That works out to $18,500 (class fee) + $700 (plane) + 21 x $275 (hotel + meals) + 3 x $3,600 (opportunity cost of work she won’t be doing) — $18,500 + $700 + $5,775 + $10,800 = a whopping total of $35,775. That, many people argue, is far too much for what they get out of the course – it represents just over 10 weeks of her regular revenue, or approximately 1/5th of her year’s revenue.

If those numbers were the final answer, they’d be right.

However, Paul has some great talking points in his post; although he focuses on the non-economic piece, I’d like to tie some of those back in to hard numbers.

  • The level of training. I don’t care how well you know Exchange. You will walk out of this class knowing a lot more and you will be immediately able to take advantage of that knowledge to the betterment of your customers. Plus, you will have ongoing access to some of the best Exchange people in the world. I don’t know a single consultant out there who can work on a problem that is stumping them for hours or days and be able to consistently bill every single hour they spend showing no results. Most of us end up eating time, which shows up in the bottom line. For the sake of argument, let’s say that our consultant ends up spending 30% instead of 10% of her time working on issues that she can’t directly bill for because of things like this. That drops her opportunity cost from $3,600/week to $2,520, or $7,560 for the three weeks (and it means she’s only got an annual revenue of $126,000). If she can reduce that non-billable time, she can increase my efficiency and get more real billable work done in the same calendar period. We’ll say she can gain back 10% of that lost time and get up to only 20% lost time, or 32 hours a week.
  • The demonstration of competence. This is a huge competitive advantage for two reasons. First, it helps you land work you may not have been able to land before. This is great for keeping your pipeline full – always a major challenge in a rough economy. Second, it allows you to raise your billing rates. Okay, true, maybe you can’t raise your billing rates for all the work that you do for all of your customers, but even some work at a higher rate directly translates to your pocket book. Let’s say she can bill 25% of those 32 hours at $150/hour. That turns her week’s take into (8 x $150) + (24 x $100) = $1,200 + $2,400 = $3,600. That modest gain in billing rates right there compensates for the extra 10% loss of billing hours and pays for itself every 3-4 weeks.

Let’s take another look at those overall numbers again. This time, let’s change our ballpark with numbers more closely matching the reality of the students at the classes:

  • There’s a 30% discount on the class, so she pays only $12,950 (not $18,500).
  • We’ll keep the $700 for plane tickets.
  • From above, we know that her real lost opportunity cost is more like $7,560 (3 x $2,520 and not the $10,800 worst case).
  • She can get shared apartment housing with other students right close to campus for more like $67 a night (three bedrooms).
  • Food expenses are more typically averaged out to $40 per day. You can, of course, break the bank on this during the weekends, but during the days you don’t really have time.

This puts the cost of her rotation at $12,950 + $700 + (21 x $107) + $7,560, or $23,457. That’s only 66% – two-thirds – of the worst-case cost we came up with above. With her adjusted annual revenue of $126,000, this is only 19%, or just less than 1/5th of her annual revenue.

And it doesn’t stop there. Armed with the data points I gave above, let’s see how this works out for the future and when the benefits from the rotation pay back.

Over the year, our hypothetical consultant, working only a 40-hour work week (I know, you can stop laughing at me now) brings in 50 x $2,520 = $126,000.  The MCM rotation represents 19% of her revenue for the year before costs.

However, let’s figure out earning potential in that same year: (47 x $3,600) – ($13,650 + $700 + $2247) = $152,603. That’s a 20% increase.

Will these numbers make sense for everyone? No, and I’m not trying to argue that they do. What I am trying to point out, though, is that the business justification for going to the rotation may actually make sense once you sit down and work out the numbers. Think about your current projects and how changes to hours and billing rates may improve your bottom line. Think about work you haven’t gotten or been unwilling to pursue because you or the customer felt it was out of your league. Take some time to play with the numbers and see if this makes sense for you.

If it does, or if you have any further questions, let me know.

ExMon released (no joke!)

If you’re tempted to think this is an April Fool’s Day joke, no worries – this is the real deal. Yesterday, Microsoft published the Exchange 2007-aware version of Exchange Server User Monitor (ExMon) for download.

“ExMon?” you ask. “What’s that?” I’m happy to explain!

ExMon is a tool that gives you a real-time look inside your Exchange servers to help find out what kind of impact your MAPI clients are having on the system. That’s right – it’s a way to monitor MAPI connections. (Sorry; it doesn’t monitor WebDAV, POP3, IMAP, SMTP, OWA, EAS, or EWS.) With this release, you can now monitor the following versions of Exchange:

  • Exchange Server 2007 SP1+
  • Exchange Server 2003 SP1+
  • Exchange 2000 Server SP2+

You can find out more about it from TechNet.

Even though the release date isn’t a celebration of April 1st, there is currently a bit of an unintentional joke, as shown by the current screenshot:


Note that while the Date Published is March 31, the Version is only 06.05.7543 – which is the Exchange 2003 version published in 2005, as shown below:


So, for now, hold off trying to download and use it. I’ll update this post when the error is fixed.

Two CCR White Papers from Missy

This actually happened last week, but I’ve been remiss in getting it posted (sorry, Missy!) Missy recently completed two Exchange 2007 whitepapers, both centered around the CCR story.

The first one, High Availability Choices for Exchange Server 2007: Continuous Cluster Replication or Single Copy Clustering, provides a thorough overview of the questions and issues to be considered by companies who are looking for Exchange 2007 availability:

  • Large mailbox support. In my experience, this is a major driver for Exchange 2007 migrations and for looking at CCR. Exchange 2007’s I/O performance increases have shifted the balance for the Exchange store being always I/O bound to now sometimes being capacity bound, depending on the configuration, and providing that capacity can be extremely expensive in SCC configurations (that typically rely on SANs). CCR offers some other benefits that Missy outlines.
  • Points of failure. With SCC, you still only have a single copy of the data – making that data (and that SAN frame) a SPOF. There are mitigation steps you can take, but those are all expensive. When it comes to losing your Exchange databases, storage issues are the #1 cause.
  • Database replication. Missy takes a good look at what replication means, how it affects your environment, and why CCR offers a best-of-breed solution for Exchange database replication. She also tackles the religious issue of why SAN-based availability solutions aren’t necessarily the best solution – and why people need to re-examine the question of whether Exchange-based availability features are the right way to go.
  • RTO and RPO. These scary TLAs are popping up all over the place lately, but you really need to understand them in order to have a good handle on what your organization’s exact needs are – and which solution is going to be the best fit for you.
  • Hardware and storage considerations. Years of cluster-based availability solutions have given many Exchange administrators and consultants a blind spot when it comes to how Exchange should be provisioned and designed. These solutions have limited some of the flexibility that you may need to consider in the current economic environment.
  • Cost. Talk about money and you always get people’s attention. Missy details several areas of hidden cost in Exchange availability and shows how CCR helps address many of these issues.
  • Management. It’s not enough to design and deploy your highly available Exchange solution – if you don’t manage and monitor it, and have good operational policies and procedures, your investment will be wasted. Missy talks about several realms of management.

I really recommend this paper for anyone who is interested in Exchange availability. It’s a cogent walkthrough of the major discussion points centering around the availability debate.

Missy’s second paper, Continuous Cluster Replication and Direct Attached Storage: High Availability without Breaking the Bank, directly addresses one of the key assumptions underneath CCR – that DAS can be a sufficient solution. Years of Exchange experience have slowly moved organizations away from DAS to SAN, especially when high availability is a requirement – and many people now write off DAS solutions out of habit, without realizing that Exchange 2007 has in fact enabled a major switch in the art of Exchange storage design.

In order to address this topic, Missy takes a great look at the history of Exchange storage and the technological factors that led to the initial storage design decisions and the slow move to SAN solutions. These legacy decisions continue to box today’s Exchange organizations into a corner with unfortunate consequences – unless something breaks demand for SAN storage.

Missy then moves into how Exchange 2007 and CCR make it possible to use DAS, outlining the multiple benefits of doing so (not just cost – but there’s a good discussion of the money factor, too).

Both papers are outstanding; I highly recommend them.

Haz Firewall, Want Cheezburger

Although Window Server 2008 offers an impressive built-in firewall, in some cases we Exchange administrators don’t want to have to deal with it. Maybe you are building a demo to show a customer, or a lab environment to reproduce an issue. Maybe you just want to get Exchange installed now and will loop back to deal with fine-tuning firewall issues later. Maybe you have some other firewall product you’d rather use. Maybe, even, you don’t believe in defense in depth – or don’t think server-level firewall is useful.

Whatever the reason, you’ve decided to disable the Windows 2008 firewall for an Exchange 2007 server. It turns out that there is a right way to do it and a wrong way to do it.

The wrong way


This seems pretty intuitive to long-term Exchange administrators who are used to Windows Server 2003. The problem is, the Windows firewall service in Windows 2008 has been re-engineered and works a bit differently. It now includes the concept of profiles, a feature that built into the networking stack at a low level, enabling Windows to identify the network you’re on and apply the appropriate sets of configuration (such as enabling or disabling firewall rules and services).

Because this functionality is now tied into the network stack, disabling the Windows Firewall service and shutting it off can actually lead to all sorts of interesting and hard-to-fix errors.

The right way

Doing it the right way involves taking advantage of those network profiles.

Method 1 (GUI):

  1. Open the Windows Firewall with Advanced Security console (Start, Administrative Tools, Windows Firewall with Advanced Security).
  2. In the Overview pane, click Windows Firewall Properties.
  3. For each network profile (Domain network, Public network, Private network) that the server or image will be operating in, select Firewall state to Off. Typically, setting the Domain network profile is sufficient for an Exchange server, unless it’s an Edge Transport box.
  4. Once you’ve set all the desired profiles, click OK.
  5. Close the Windows Firewall with Advanced Security console.


Method 2 (CLI):

  1. Open your favorite CLI interface: CMD.EXE or PowerShell.
  2. Type the following command:netsh advfirewall set profiles state off

    Fill in profiles with one of the following values:

    • DomainProfile — the Domain network profile. Typically the profile needed for all Exchange servers except Edge Transport.
    • PrivateProfile — the Private network profile. Typicall the profile you’ll need for Edge Transport servers if the perimeter network has been identified as a private network.
    • PublicProfile — the Public network profile. Typicall the profile you’ll need for Edge Transport servers if the perimeter network has been identified as a public network (which is what I’d recommend).
    • CurrentProfile — the currently selected network profile
    • AllProfiles — all network profiles
  3. Close the command prompt.


And there you have it – the right way to disable the Windows 2008 firewall for Exchange Server 2007, complete with FAIL/LOLcats.

Outlook Performance Goodness

Microsoft has recently released a pair of Outlook 2007 updates (okay, technically, they’re updates for Outlook 2007 with SP1 applied) that you might want to look at installing sooner rather than later. These two updates are together being billed as the “February cumulative update” at KB 968009, which has some interesting verbiage about how many of the fixes were originally slated to be in Outlook 2007 SP2:

The fix list for the February CU may not be identical to the fix list for SP2, but for the purposes of this article, the February CU fixes are referred to synonymously with the fixes for SP2. Also, when Office suite SP2 releases, there will not be a specific package that targets only Outlook.

Let’s start with the small one, KB 697688. This one fixes some issues with keyboard shortcuts, custom forms, and embedded Web browser controls.

Okay, with that out of the way, let’s move on to juicy KB 961752, an unlooked-for roll-up containing a delectable selection of fixes. Highlights include:

  1. Stability fixes
  2. SharePoint/Outlook integration
  3. Multiple mailbox handling behavior
  4. Responsiveness

From reports that I’ve seen, users who have applied these two patches are reporting significantly better response times in Outlook 2007 cached mode even when attaching to large mailboxes or mailboxes with folders that contain many items — traditionally, two scenarios that caused a lot of problems for Outlook because of the way the .ost stored local data. They’ve also reported that the “corrupted data file” problem that many people have complained about (close Outlook, it takes forever to shut down so writes to the .ost don’t fully happen) seems to have gone away.

Note that you may have an awkward moment after starting Outlook for the first time after applying these updates: you’re going to get a dialog something like this:


“Wait a minute,” you might say. “First use? Where’s my data?” Chillax [1]. It’s there — but in order to do the magic, Outlook is changing the structure of the existing .ost file. This is a one-time operation and it can take a little bit of time, depending on how much data you’ve got stuff away in there (I’ve currently got on the order of 2GB or so, so you can draw your own rough estimates; I suspect it also depends on the number/depth of folders, items per folder, number of attachments, etc.)

Once the re-order is done, though, you get all the benefits. Faster startup, quicker shut-down, and generally more responsive performance overall. This is seriously crisp stuff, folks — I opened my Deleted Items folder (I hardly ever look in there, I just occasionally nuke it from orbit) and SNAP! everything was there as close to instantly as I can measure. No waiting for 3-5 (or 10, or 20) seconds for the view to build.


[1] A mash-up of “chill” and “relax”. This is my new favorite word.

What happens in Vegas gets blogged

Update (11/15/08 1240PST): Fixed the URLs in the links to point to the actual decks. Sorry!

Time this year has flown! Hard to believe that I’ve just finished up my last conference for the year — Exchange Connections Fall at the fabulous Mandalay Bay resort and conference center in Las Vegas. This was my second trip to Vegas this year (the first was in May for the Exchange/DPM session at MMS), and I really prefer the city in November: far fewer people, much more pleasant temperatures.

I gave the following three sessions yesterday:

  • (EXC16) The Collaboration Blender — This session is adapted from the Outlook and SharePoint: Playing Well Together article I wrote for Windows IT Pro magazine (subscription required). Exchange and SharePoint are both touted as collaboration solutions and have some overlapping functionality, so this session explores some of the overlaps and compares and contrasts what each is good for. (In other words, we spend a lot of time talking about Exchange public folders.) And where does Outlook fit into this mess? There’s even a handy summary table!
  • (EXC17) Exchange Virtualization — As I confessed to my attendees, this session was a gamble that paid off. Back when I proposed the topic, there was no official statement of Microsoft support for Exchange virtualization (no, “Don’t!” doesn’t really count). I guessed that by the time November rolled around, Hyper-V would have finally shipped and they’d have shifted that stance — and I was right. Because I focus more on the Hyper-V side of things, I invited VMWare to send a representative to the session to present their take on the subject. The resulting session was very good, and I learned a bunch of things too.
  • (EXC18) Exchange Protection using Data Protection Manager — Although a lot of the content here was the same material that I’ve already presented this year (what, 4-5 times now?), I did have to make some changes thanks to the brilliant curve ball that Jason Buffington and his crew in the DPM team threw me. You see, Connections now has all Microsoft speakers speak on one day (imaginatively named “Microsoft Day” for some reason), and that day was Tuesday. While Jason couldn’t be here, Karandeep Anand (who is the DPM bomb!) was — and I’ve been trading decks and VMs and material back and forth with Jason and Karandeep for over a year now. Rather than give a less brilliant copy of the session Karandeep had already done, I added in some new material focusing on the internals of the Exchange store and how that affects Exchange protection, removed the demo, and really attacked the topic from the Exchange side of things. I think it worked. Either that or it was people staying to get free copies of the DPM book that my publisher thoughtfully provided.

A lot of my fellow speakers dread speaking on the last day, but I’ve found that I’ve come to enjoy it. Sure, you have smaller attendance numbers — but the people who are there (especially if you get lucky enough to do the last session on the last day) are the people who really want to be there. I also encourage questions from the audience during the presentation, with the caveat that if they’re too detailed or going to be answered later I’ll defer them; I like the interactivity. I usually learn something from my attendees, which makes it a good time for everyone.

Back to the grind. I know I’ve been way too quiet on the blogfront lately, and I promise, I’ve got some fresh new content in the works. First, though, I have to catch up on the paying work. For some reason, my corporate overlords seem to expect me to do billable work too, not just speak and blog. Ah, well. At least I didn’t get RickRolled on my birthday!

Masters update: short form

I have gotten a lot of email from people who wished me well and wanted to find out the status of my recent Masters rotation. I’m working on a bigger write-up, but here’s the short form:

  1. It was intense. I had a ton of fun, I learned more than I thought I could, and I met a lot of great people who are scary smart. I was also exhausted after it was all said and done.
  2. It was worth the money. Paul breaks it down for you here, and I agree with every data point. I think it’s fair to ignore the cost of travel, because no matter where you go for training, you’d have to pay it.
  3. I’m not yet a Master. There’s four tests you have to pass, and I only nailed three of them. I’m now patiently waiting word for retests, as are several of my classmates, and then we’ll knock ’em dead.

Thank you, everyone, for your well-wishes and questions. As I said, I’m working on a longer post or series of posts, but those will be a bit delayed in coming because I want to run them by the folks at the MCM/MCA program to make sure that I’m not talking about stuff I shouldn’t be.

…does this mean I’ll get an apprentice?

For the next three weeks, I’ll be squirreled away in a hidden location, having my brains surgically removed and replaced with a quantum-computing device filled with Exchange knowledge. Good times!

Seriously, though, I’ll be off to the October rotation of the three-week Microsoft Certified Master: Microsoft Exchange Server 2007 program. The Master certification is a new certification that Microsoft is rolling out, placed between the MCITP and MCA certifications. It’s so new, in fact, that it doesn’t yet appear on the Find a Microsoft Certification by Technology page.

So, newness established, what does this Master certification entail? First, it’s not your typical Microsoft certification.

To ensure that people going through this experience are ready for it, they’re actually screening candidates. For the Exchange Master program, the published criteria are:

  • 5+ years Exchange 2003
  • 1+ years Exchange 2007
  • Thorough understanding of Exchange design/architecture, AD, DNS, and core network services
  • Certification as a MCITP: Enterprise Messaging (Exchange 2007 exams 70-236, 70-237, and 70-238)
  • Certification as a MCSE Windows 2003 or MCTS: Windows Server 2008 Active Directory Configuration (exam 70-640)

Scrape all that together, and what do you get?

  • Three weeks of “highly intensive classroom training” — and by all reports, they’re not kidding when they say that. I’ve been through plenty of Microsoft classes, and for this one, my corporate lords have completely cleared the decks for me.
  • Three computerized written tests (I assume one per week). I have no idea what these are going to be like, but after having done three exams in the past month, I really hope they’re a notch above the standard Microsoft certification exam.
  • One lab-based exam (administered at the end). Now, I really like the thought of hands-on tests; one of the best job interviews I ever went through included a hands-on test. However, they’re a lot more stressful precisely because you can’t fake things or puzzle out the the right answer through careful elimination. You have to know your stuff.

Assuming I survive and my head doesn’t asplode, in a month I’ll get to call myself an Exchange Master. This, of course, leads to the obvious question: do I get an apprentice? If so, I have a suggestion:

The determined apprentice

I really want an apprentice. I think I deserve one. You listening, 3Sharp?

OCS follows Exchange into 64-bit-only land

You may have missed this interesting blog post this morning amidst all the political kerfuffle, so let me sum up: the next version of OCS will only support x64 platforms.

This isn’t the big deal it would have been for OCS 2007. A lot of the initial FUD around the 64-bit-only move in Exchange 2007 turned out to be mere steam. While there were some initial challenges involved in managing the new 64-bit Exchange deployment from 32-bit machines, Microsoft got a lot of the licensing figured out and released the appropriate sets of tools to allow management of Exchange 2007 from both 32-bit and 64-bit environments. I fully expect that the OCS group has been paying close attention to all of this and taken good notes.

There’s no denying that Exchange 2007 benefits from the “64-bit only in production” stance — and with the release of Windows Server 2008 and Hyper-V, not to mention Microsoft’s updated support statement for virtualization environments, the need for 32-bit environments is going away. My biggest reason for wanting 32-bit Exchange environments was so I could run demos under Virtual Server; now that I have Hyper-V, I’m probably not in any rush to go back to Virtual Server and the 32-bit limitation. 64-bit hardware is the norm today, and the x64 Windows variants are solid and mainstream enough for my dedicated application servers. (Maybe not so for the desktop quite yet, but still getting there rapidly.)

The one thing I’m skeptical about, though, is whether the move to 64-bits is really going to reduce the total number of servers in the deployment. In Exchange 2007, I only saw the server reductions in very large environments; the mailbox-per-server gains we got from 64-bits was offset by the explicit breakout of roles and the business needs that drove redundant configurations like CCR (which meant no co-locating roles with the Mailbox role) and multiple HT/CAS servers. I’m wondering how this is going to play out with the next version of OCS, where it already has so many distinct roles in play.

What I *hope* to see is that the maximum capacity of each server role (such as the number of users per pool or the number of streams per mediation server) can be driven upwards; this makes the large datacenter configuration options much more attractive, because it does translate to a reduced number of servers. However, for organizations that still have relatively low bandwidth separating their various locations, 64-bits won’t do much to help; OCS deployment planning is very dependent on bandwidth, and is often the top limit on scalability long before the limits of the 32-bit Windows environment.

First Look at Microsoft Online Services: the Sign-In tool

Continuing from my previous post on MOS

I didn’t really mention this in the previous post, but MOS is designed to provide a hosted alternative to the server-side applications. One of the goals is to continue working with existing native clients and client access methods, so (for example) you can access your Exchange Online mailbox through OWA (running from MOS), through Outlook, or even through EAS/Windows Mobile. In order to do this, though, your client applications need to know how to talk to MOS and provide the proper credentials.

You can do this the hard way or the easy way. The hard way is running around and reconfiguring each application by hand and teaching your users how to use a separate set of credentials. The easy way is to use the MOS Sign-In tool, a little .NET 3.0 application that runs on the client desktop. It interacts with Outlook 2007 RTM/SP1, LiveMeeting 8, and IE7+.

When this application is run, it will invite the user to logon to MOS. The first time they do so, they’re required to change their password. It then detects the apporpriate applications, offers to configure them to work with MOS, and then just sits quietly on the desktop, providing a seamless SSO experience.

To be continued…

First look at Microsoft Online Services: adding domains

I’m at an airlift here in Redmond for the new Microsoft Online Services (MOS), Microsoft’s hosted services platform. Right now, MOS offers a combination of hosted Exchange (OWA, Outlook, and even EAS!), hosted SharePoint, and Live Meeting. We’ve just gone through an overview of the service, and it looks cool — enough so that I’m now seriously considering switching my personal domains over to it (especially since they offer the ability to synchronize with your Active Directory deployment).

MOS is currently in beta and you can go sign up for a time-limited trial. There’s only a certain number of trial accounts active at any given time, so your trial request may not be provisioned immediately; however, you can go to and sign up for one. You’ll need a Windows Live account.

As you might imagine, MOS allows you to associate one or more DNS domains with your online account. When you register for your account, you’re asked for a domain. This domain is not verified and, in fact, seems to be used simply as an internal administrative tag — once your account and service is set up, you have to specifically add DNS domains. Adding them is a fairly simple process:

  1. Register your domain name with a registrar.
  2. Provision your domain with a DNS provider (often combined with step 1).
  3. Add the domain name to your MOS Admin Center.
  4. Run the verification wizard and add the auto-generated CNAME to your domain’s DNS zone.
  5. Validate the domain in the MOS Admin Center.
  6. Start provisioning users with this domain, enable inbound e-mail on this domain, etc.

The verfication step is an important piece, because this helps MOS make sure that you’re using a domain you’re actually in control of. Otherwise, malicious people could sign in and hijack your domain, which would suck. The way Microsoft does this is actually simple and elegant: they generate a unique CNAME record (that looks very much like a GUID), and ask you to add this CNAME record, pointing back to a server under their control, to your zone. This has lots of advantages:

  • It’s pragmatic. If you can add a CNAME record to a zone file, you effectively control the domain.
  • It avoids the nastiness that can result in WHOIS-based verification and allows people who register domains to continue using proxy companies, hiding their personal info from WHOIS spammers.
  • It’s relatively easy. You simply have to add a simple record to your DNS; if you can’t do this (or your DNS hoster can’t do it for you), then you have much bigger problems managing your DNS and verifying your DNS domain under MOS is the least of your problems.
  • It’s low-impact. The generated CNAME is highly unlikely to be queried during normal operations by your users; only MOS is likely to be looking for it. It doesn’t require you to repoint your MX records or otherwise make major modifications to your infrastructure if all you want to do is start using online SharePoint and Live Meeting.

Note that just because you add a domain to MOS doesn’t mean you have to use it for email! That’s a separate operation, which is a two-step process of enabling inbound email for that domain and then updating your MX records appropriately.

More on other MOS functionality coming later…big thanks to the event staff for their kind permission for me to blog!

Hyper-V in the hizzouse!

Everyone’s being so coy in the Windows blogosphere today. “As you may have heard…” Heck with that; this is wicked cool. Hyper-V has Released To Manufacturing … and is already available for download. As the link explains, it’ll start coming down the Windows Update pipe July 8th. If you don’t want your Windows Server 2008 machine to be updated yet, don’t be blindly accepting updates.

Why wouldn’t you want to get it first thing?

  • You’re running a previous version of Hyper-V. If so, be aware that upgrading your VMs is not automatic. It’s not a horrible process, but it will take some time. You have to manually export each VM, remove the VMs from the server, upgrade the server, re-import the VMs, then update the Integration Services. The more VMs you have, the more time this will take.
  • You’re running some software that is not yet compatible with Hyper-V RTM but works with an earlier build. In this case, you want to wait until that software has a patch available.

I fit into both categories. I think I’m going to wait until I’m back from vacation to do it.

Oh, yes, just because Hyper-V is now RTM doesn’t mean that you can go run to install Exchange 2007 on it in production. See Scott Schnoll’s post for more info.

These are not the solutions you’re looking for

As IT professionals, we are more than often prone to fall to the perils of magical thinking. (I’m sure this is a side-effect of being human, which is a pesky and bothersome condition I will have to do something about one of these days.) Magical thinking in this context is when we have not internalized the intricacies of a problem and instead rely on formulas rather than true understanding to come up with solutions.

At one ISP I used to work at, we had a glorious reclaimed piece of technology, an Auspex NS-5500 file server. Every now and then on reboot, this old beast of a machine would fail to boot up; the cure was to open the cover over the drive cage and give it a good swift whack. We all assumed that this was because one of the drive connectors was a bit loose, but when our “magic” fix failed to work one night I discovered that it was in fact because one of the screws holding things in place was missing, allowing the drive bay to sag just a tiny bit. It was this tiny bit of sag that put just enough stress on the connector for drive 0. Had we actually opened the case up earlier, we’d have been able to solve the problem — and prevent a year of whacking the server.

All too often, I see magical thinking in the field of security. Case in point: I recently heard about a gentleman who has a client that is requesting ETRN support be added back to Exchange 2007, either natively or through an add-on. They want to deploy the Edge role in their DMZ, have it queue up mail for the internal organization, and then have their Hub Transports (in the internal protected network) initiate a connection out to de-queue the messages using the ETRN SMTP extension. The reason they want this is that they’ve done due diligence and read some very thorough documents about computer network zones and have come to the conclusion that all network connections must be initiated from the most secure network. This, they say, removes the threat of malware taking over the Edge server in the DMZ and allowing an attacker to use it as a launching point to the protected network.

Now, the recommendation for connections to be initiated from a more secure network to a less secure network is a good general baseline to follow when it makes sense. However, it is not realistic in all cases (if we followed this to the letter, nobody would be able to receive e-mail from external senders except through random polling of Internet SMTP hosts, which is not at all scalable). This is doubly true if you don’t understand how the underlying protocols work. Case in point: ETRN, defined by RFC 1985, “SMTP Service Extension for Remote Message Queue Starting”. Quoting from section 3, “The Remote Queue Processing Declaration service extension” (emphasis added):

To save money, many small companies want to only maintain transient connections to their service providers.  In addition, there are some situations where the client sites depend on their mail arriving quickly, so forcing the queues on the server belonging to their service provider may be more desirable than waiting for the retry timeout to occur.

Both of these situations could currently be fixed using the TURN command defined in [1], if it were not for a large security loophole in the TURN command.  As it stands, the TURN command will reverse the direction of the SMTP connection and assume that the remote host is being honest about what its name is.  The security loophole is that there is no documented stipulation for checking the authenticity of the remote host name, as given in the HELO or EHLO command.  As such, most SMTP and ESMTP implementations do not implement the TURN command to avoid this security loophole.

This has been addressed in the design of the ETRN command.  This extended turn command was written with the points in the first paragraph in mind, yet paying attention to the problems that currently exist with the TURN command.  The security loophole is avoided by asking the server to start a new connection aimed at the specified client.

See the problem? ETRN was not designed to solve a security problem; it was designed to solve a financial problem back in days when always-on bandwidth was a lot more expensive and most ISPs metered traffic. It masquerades as solving a security problem only because it’s designed to avoid a loophole in an insecure and exploitable feature. As a result, ETRN won’t solve the problem these people want it to solve; all it does is tell the system in the DMZ to initiate a new connection to the Hub Transport servers. It doesn’t reuse the existing connection initiated by the Hub Transport servers. They can’t use a firewall rule to block outgoing access from the Edge to the Hub Transport and be safe, because they’ll cut off all incoming traffic.

However, let us for a moment assume that it did work the way they wanted it to: my Hub Transport initiates an outbound SMTP session to the Edge. In this session, HT is the SMTP client, ET is the SMTP server. As soon as HT issues the ETRN command, they still have to swap roles — HT is now using the SMTP server code paths, while the ET is using the SMTP client code paths. Any theoretical vulnerabilities that are in the HT SMTP implementation are still going to be there, still exposed to the message traffic about to be sent down the connection, still open to exploitation.

This is the magical thinking: firewalls and a DMZ will protect my traffic. This is not true; firewalls and networks zones are two components of a complete security plan. Neither firewalls nor network zones can protect legitimate traffic, nor are they designed to; they are designed to allow you to designate which traffic is legitimate. If you want to secure that traffic, you need to turn to other measures.

Tech-Talk: Making Backups Cool with DPM

While I was at the Tech-Ed NA IT Pro conference last week, Jason Buffington and I took the chance to invade the Tech-Ed Online fishbowl studio and record a quick Tech-Talk on using DPM. You can now view it online on the Tech-Ed IT Pro page and the Library page, or stream it directly. Now that Tech-Ed’s over, maybe we’ll both find the time to be on Xbox Live at the same time so we can continue our discussion in Call of Duty 4…

Updated Exchange Developer Roadmap

To reinforce yesterday’s post about Exchange Web Services (EWS), I wanted to draw your attention to the Exchange Developer Roadmap posted on May 22 2008 on the Exchange API-spotting blog.

There shouldn’t really be any surprises here, but there were a couple of items I wanted to highlight. First:

Given this commitment to Web services and our goal of making Exchange Web Services the richest developer interface for Exchange (emphasis added)


Here’s a preview of some of the functionality that we plan to add to the next release of Exchange Web Services:

  • Access to Folder Associated Items (FAI) and read/write access to user settings (Devin: this page in the MAPI reference indicates that FAIs are things like views and forms. I believe that this also fixes a known quirk of EWS that keeps you from creating Outlook-visible search folders that use certain property paths. I believe this also gives access to server-side rules, if they’re not already accessible through a separate part of the API.)
  • Management of Personal Distribution Lists (Devin: very cool.)
  • Throttling capabilities that give Exchange administrators control over system resource consumption (Devin: this will be very nice for helping keep poorly written applications from taking down the Exchange servers.)
  • A powerful and easy-to-use server-to-server authentication model to enable building portals and enterprise mash-ups (Devin: let’s hope this can ease some of the pain of building Exchange-aware SharePoint sites, at least those that don’t require direct access to private mailbox content.)
  • An easy-to-use Microsoft .NET API that fully wraps the Web service calls, which makes Web service development even easier (Devin: I’ll be interested in seeing how this stacks up against third-party offerings like the Independentsoft EWS client offering.)

Then they go on to list the APIs that will get removed (Exchange WebDAV, Store Events, CDO 3.0/CDOEx, and ExOLEDB) and moved to “extended support” (Exchange Server MAPI Client, CDO 1.2.1). Don’t get too excited by the MAPI client — it’s not what you think:

Provides server applications a MAPI runtime for accessing Exchange. 

Note: This is not the Outlook MAPI Client library that is included with Outlook.


Outlook’s Exchange MAPI Store provider, available in the Outlook MAPI Client library can also be used to access an Exchange mailbox or public folder.

If you’re going to start writing Exchange-aware applications, you should probably start looking at EWS first for future compatibility. If you’re trying to support Exchange 2003 at the same time…good luck.

A .NET add-on for working with Exchange Web Services

I just got word that Independentsoft has come out with a beta version of an EWS client API for the .NET Framework and .NET Compact Framework. I’ve not looked at it yet, but I’m particularly hopeful about having a good way to work with EWS from Windows Mobile devices.

Exchange Web Services (EWS), introduced in Exchange 2007 and enhanced in Exchange 2007 SP1, is Microsoft preferred interface for all future programmatic reach into the Exchange store. While EWS is a Web service, it can be pretty complicated to work with. Luckily, we’ve done some work with EWS here at 3Sharp; Paul’s been presenting some developer training sessions on EWS in partnership with Microsoft. We’ve found that Inside Microsoft Exchange Server 2007 Web Services has been a valuable reference on EWS.

One of the challenges for EWS development is that the schema and object model is pretty complex when compared with the typical Web service, enough so that you need to use special Visual Studio proxy classes when you use .NET to work with EWS. This, by the way, is very likely the cause of the compatibility issue I found between EWS and SharePoint Designer — Designer’s proxy classes aren’t the EWS-aware ones.

Revised guidance on protecting Exchange with DPM 2007

Just a quick note to let you  all know that the Protecting Exchange Server with DPM 2007 white paper is available for download from Microsoft. This is the same white paper I worked on for them last year, but freshly revised to include more guidance around mailbox-level recovery.

I’ll be giving a talk around this topic next week at Tech-Ed (IT Pro) in Orlando, session number MGT369. Hope to see you there! (Yes, this is the same talk I did at Exchange Connections in Orlando and in MMS in Vegas a month ago; it seems to be a popular session!)