Beating Verisign certificate woes in Exchange

I’ve seen this problem in several customers over the last two years, and now I’m seeing signs of it in other places. I want to document what I found so that you can avoid the pain we had to go through.

The Problem: Verisign certificates cause Exchange publishing problems

So here’s the scenario: you’re deploying Exchange 2010 (or some other version, this is not a version-dependent issue with Exchange) and you’re using a Verisign certificate to publish your client access servers. You may be using a load balancer with SSL offload or pass-through, a reverse proxy like TMG 2010, some combination of the above, or you may even be publishing your CAS roles directly. However you publish Exchange, though, you’re running into a multitude of problems:

  • You can’t completely pass ExRCA’s validation checks. You get an error something like:  The certificate is not trusted on any version of Windows Phone device. Root = CN=VeriSign Class 3 Public Primary Certification Authority – G5, OU=”(c) 2006 VeriSign, Inc. – For authorized use only”, OU=VeriSign Trust Network, O=”VeriSign, Inc.”, C=US
  • You have random certificate validation errors across a multitude of clients, typically mobile clients such as smartphones and tablets. However, some desktop clients and browsers may show issues as well.
  • When you view the validation chain for your site certificate on multiple devices, they are not consistent.

These can be very hard problems to diagnose and fix; the first time I ran across it, I had to get additional high-level Trace3 engineers on the call along with the customer and a Microsoft support representative to help figure out what the problem was and how to fix it.

The Diagnosis: Cross-chained certificates with an invalid root

So what’s causing this difficult problem? It’s your basic case of a cross-chained certificate with an invalid root certificate. “Oh, sure,” I hear you saying now. “That clears it right up then.” The cause sounds esoteric, but it’s actually not hard to understand when you remember how certificates work: through a chain of validation. Your Exchange server certificate is just one link in an entire chain. Each link is represented by an X.509v3 digital certificate that is the footprint of the underlying server it represents.

At the base of this chain (aka the root) is the root certificate authority (CA) server. This digital certificate is unique from others because it’s self-signed – no other CA server has signed this server’s certificate. Now, you can use a root CA server to issue certificates to customers, but that’s actually a bad idea to do for a lot of reasons. So instead, you have one or more intermediate CA servers added into the chain, and if you have multiple layers, then the outermost layer are the CA servers that process customer requests. So a typical commercially generated certificate has a validation chain of 3-4 layers: the root CA, one or two intermediate CAs, and your server certificate.

Remember how I said there were reasons to not use root CAs to generate customer certificates? You can probably read up on the security rationales behind this design, but some of the practical reasons include:

  • The ability to offer different classes of service, signed by separate root servers. Instead of having to maintain separate farms of intermediate servers, you can have one pool of intermediate servers that issue certificates for different tiers of service.
  • The ability to retire root and intermediate CA servers without invalidating all of the certificates issued through that root chain, if the intermediate CA servers cross-chain from multiple roots. That is, the first layer intermediate CA servers’ certificates are signed by multiple root CA servers, and the second layer intermediate CA servers’ certificates are signed by multiple intermediate CA servers from the first layer.

So, cross-chaining is a valid practice that helps provide redundancy for certificate authorities and helps protect your investment in certificates. Imagine what a pain it would be if one of your intermediate CAs got revoked and nuked all of your certificates. I’m not terribly fond of having to redeploy certificates for my whole infrastructure without warning.

However, sometimes cross-chained certificates can cause problems, especially when they interact with another feature of the X.509v3 specification: the Authority Information Access (AIA) certificate extension. Imagine a situation where a client (such as a web browser trying to connect to OWA), presented with an X.509v3 certificate for an Exchange server, cannot validate the certificate chain because it doesn’t have the upstream intermediate CA certificate.

If the Exchange server certificate has the AIA extension, the client has the information it needs to try to retrieve the missing intermediate CA certificate – either retrieving it from the HTTPS server, or by contacting a URI from the CA to download it directly. This only works for intermediate CA certificates; you can’t retrieve the root CA certificate this way. So, if you are missing the entire certificate chain, AIA won’t allow you to validate it, but as long as you have the signing root CA certificate, you can fill in any missing intermediate CA certificates this way.

Here’s the catch: some client devices can only request missing certificates from the HTTPS server. This doesn’t sound so bad…but what happens if the server’s certificate is cross-chained, and the certificate chain on the server goes to a root certificate that the device doesn’t have…even if it does have another valid root to another possible chain? What happens is certificate validation failure, on a certificate that tested as validated when you installed it on the Exchange server.

I want to note here that I’ve only personally seen this problem with Verisign certificates, but it’s a potential problem for any certificate authority.

The Fix: Disable the invalid root

We know the problem and we know why it happens. Now it’s time to fix it by disabling the invalid root.

Step #1 is find the root. Fire up the Certificates MMC snap-in, find your Exchange server certificate, and view the certificate chain properties. This is what the incorrect chain has looked like on the servers I’ve seen it on:

image

The invalid root CA server circled in red

That’s a not very helpful friendly name on that certificate, so let’s take a look at the detailed properties:

image

Meet “VeriSign Class 3 Public Primary Certification Authority – G5”

Step #2 is also performed in the Certificates MMC snap-in. Navigate to the Third-Party Root Certification Authorities node and find your certificate. Match the attributes above to the certificate below:

image

Root CA certificate hide and seek

Right-click the certificate and select Properties (don’t just open the certificate) to get the following dialog, where you will want to select the option to disable the certificate for all purposes:

image

C’mon…you know you want to

Go back to the server certificate and view the validation chain again. This time, you should see the sweet, sweet sign of victory (if not, close down the MMC and open it up again):

image

Working on the chain gang

It’s a relatively easy process…so where do you need to do it? Great question!

The process I outlined obviously is for Windows servers, so you would think that you can fix this just on the the Exchange CAS roles in your Internet-facing sites. However, you may have additional work to do depending on how you’re publishing Exchange:

  • If you’re using a hardware load balancer with the SSL certificate loaded, you may not have the ability to disable the invalid root CA certificate on the load balancer. You may simply need to remove the invalid chain, re-export the correct chain from your Exchange server, and reinstall the valid root and intermediate CA certificates.
  • If you’re publishing through ISA/TMG, perform the same process on the ISA/TMG servers. You may also want to re-export the correct chain from your Exchange server onto your reverse proxy servers to ensure they have all the intermediate CA certificates loaded locally.

The general rule is that the outermost server device needs to have the valid, complete certificate chain loaded locally to ensure AIA does its job for the various client devices.

Let me know if this helps you out.

Add to the legend