Another solution for Autodiscover 401 woes in #MSExchange

Earlier tonight, I was helping a customer troubleshoot why users in their mixed Exchange 2013/2007 organization were getting 401 errors when trying to use Autodiscover to set up profiles. Well, more accurately, the Remote Connectivity Analyzer was getting a 401, and users were getting repeating authentication prompts. However, when we tested internally against the Autodiscover endpoints everything worked fine, and manual testing externally against the Autodiscover endpoint also worked.

So why did our manual tests work when the automated tests and Outlook didn’t?

Well, some will tell you it’s because of bad NTFS permissions on the virtual directory, while others will say it’s because of the loopback check being disabled. And in your case, that might in fact be the cause…but it wasn’t in mine.

In my case, the clue was in the Outlook authentication prompt (users and domains have been changed to protect the innocent):

image

 

I’m attempting to authenticate with the user’s UPN, and it’s failing…hey.

Re-run the Exchange Remote Connectivity analyzer, this time with the Domain\Username syntax, and suddenly I pass the Autodiscover test. Time to go view the user account – and sure enough, the account’s UPN is not set to the primary SMTP address.

Moral of the story: check your UPNs.

Upgrade Windows 2003 crypto in #MSExchange migrations

Just had this bite me at one of my customers. Situation: Exchange Server 2007 on Windows Server 2003 R2, upgrading to Exchange Server 2013 on Windows Server 2012. We ordered a new SAN certificate from GoDaddy (requesting it from Exchange 2013) and installed it on the Exchange 2013 servers with no problems. When we installed it on the Exchange 2007 servers, however, the certificates would import but the new certificates (and its chain) all showed the dreaded red X.

Looking at the certificate, we saw the following error message:

image

 

If you look more closely at the certificates in GoDaddy’s G2 root chain, you’ll see it’s signed both in SHA1 and SHA2-256. And the latter is the problem for Windows Server 2003 – it has an older cryptography library that doesn’t handle the newer cypher algorithms.

The solution: Install KB968730 on your Windows Server 2003 machines, reboot, and re-check your certificate. Now you should see the “This certificate is OK” message we all love.

Load Balancing ADFS on Windows 2012 R2

Greetings, everyone! I ran across this issue recently with a customer’s Exchange Server 2007 to Office 365 migration and wanted to pass along the lessons learned.

The Plan

It all started so innocently: the customer was going to deploy two Exchange Server 2013 hybrid servers into their existing Exchange Server 2007 organization for a Hybrid organization using directory synchronization and SSO with ADFS. They’ve been investing a lot of work into upgrading their infrastructure and have been upgrading systems to newer versions of Windows, including some spiffy new Windows Server 2012 Hyper-V servers. We decided that we’d deploy all of the new servers on Windows Server 2012 R2, the better to future-proof them. We were also going to use Windows NLB for the ADFS and ADFS proxy servers instead of using their existing F5 BIG-IP load balancer, as the network team is in the middle of their own projects.

The Problem

There were actually two problems. The first, of course, was the combination of Hyper-V and Windows NLB. Unicast was obviously no good, multicast has its issues, and because we needed to get the servers up and running as fast as possible we didn’t have time to explore using IGMP with Multicast. Time to turn to the F5. The BIG-IP platform is pretty complex and full of features, but F5 is usually good about documentation. Sure enough, the F5 ADFS 2.0 deployment guide (Deploying F5 with Microsoft Active Directory Federation Services) got us most of the way there. If we had been deploying ADFS  2.0 on Server 2012 and the ADFS proxy role, I’d have been home free.

In Windows 2012 R2 ADFS, you don’t have the ADFS proxy role any more – you use the Web Application Proxy (WAP) role service component of the Remote Access role. However, that’s not the only change. If you follow this guide with Windows Server 2012 R2, your ADFS and WAP pools will fail their health checks (F5 calls them monitors) and the virtual server will not be brought online because the F5 will mistakenly believe that your pool servers are down. OOPS!

The Resolution

So what’s different and how do we fix it?

ADFS on Windows Server 2012 R2 is still mostly ADFS 2.0, but some things have been changed – out with the ADFS proxy role, in with the WAP role service. That’s the most obvious change, but the real sticker here is under the hood in the guts of the Windows Server 2012 R2 HTTP server. In Windows Server 2012 R2, IIS and the Web server engine has a new architecture that supports the SNI extension to TLS. SNI is insanely cool. The connecting machine tells it what host name it’s trying to connect to as part of the HTTPS session setup so that one IP address can be used host multiple HTTPS sites with different certificates, just like HTTP 1.1 added the Hosts: header to HTTP.

But the fact that Windows 2012 R2 uses SNI gets in the way of the HTTPS health checks that the F5 ADFS 2.0 deployment guide has you configure. We were able to work around it by replacing the HTTPS health checks with TCP Half Open checks, which connect to the pool servers on the target TCP port and wait for the ACK. If they receive it, the server is marked up.

For long-term use, the HTTPS health checks are better; they allow you to configure the health check to probe a specific URL and get a specific response back before it declares a server in the pool is healthy. This is better than ICMP or TCP checks which only check for ping responses or TCP port responses. It’s totally possible for a machine to be up on the network and IIS answering connections but something is misconfigured with WAP or ADFS so it’s not actually a viable service. Good health checks save debugging time.

The Real Fix

As far as I know there’s no easy, supported way to turn SNI off, nor would I really want to; it’s a great standard that really needs to be widely deployed and supported because it will help servers conserve IP addresses and allow them to deploy multiple HTTPS sites on fewer IP/port combinations while using multiple certificates instead of big heavy SAN certificates. Ultimately, load balancer vendors and clients need to get SNI-aware fixes out for their gear.

If you’re an F5 user, the right way is to read and follow this F5 DevCentral blog post Big-IP and ADFS Part 5 – “Working with ADFS 3.0 and SNI” to configure your BIG-IP device with a new SNI-aware monitor; you’re going to want it for all of the Windows Server 2012 R2 Web servers you deploy over the next several years. This process is a little convoluted – you have to upload a script to the F5 and pass in custom parameters, which just seems really wrong (but is a true measure of just how powerful and beastly these machines really are) – but at the end of the day, you have a properly configured monitor that not only supports SNI connections to the correct hostname, but uses the specific URI to ensure that the ADFS federation XML is returned by your servers.

An SNI-aware F5 monitor (from DevCentral)

What do you do if you don’t have an F5 load balancer and your vendor doesn’t support F5? Remember when I said that there’s no way to turn SNI off? That’s not totally true. You can go mess with the SNI configuration and change the SSL bindings in a way that seems to mimic the old behavior. You run the risk of really messing things up, though. What you can do is follow the process in this TechNet blog post How to support non-SNI capable Clients with Web Application Proxy and AD FS 2012 R2.

 

Postscript

As a side note, almost everyone seems to be calling the ADFS flavor on Windows Server 2012 R2 “ADFS 3.0.” Everyone, that is, except for Microsoft. It’s not a 3.0; as I understand it the biggest differences have to do with the underlying server architecture, not the ADFS functionality on top of it per se. So don’t call it that, but recognize most other people will. It’s just AD FS 2012 R2.

Why Virtualization Still Isn’t Mature

As a long-time former advocate for Exchange virtualization (and virtualization in general), it makes me glad to see other pros pointing out the same conclusions I reached a while ago about the merits of Exchange virtualization. In general, it’s not a matter of whether you can solve the technological problems; I’ve spent years proving for customer after customer that you can. Tony does a great job of talking about the specific mismatch between Exchange and virtualization. I agree with everything he said, but I’m going to go one further and say that part of the problem is that virtualization is still an immature technology.

Now when I say that, you have to understand: I believe that virtualization is more than just the technology you use to run virtual machines. It includes the entire stack. And obviously, lots of people agree with me, because the core of private cloud technology is creating an entire stack of technology to wrap around your virtualization solution, such as Microsoft System Center or OpenStack. These solutions include software defined networking, operating system configuration, dynamic resource management, policy-driven allocation, and more. There are APIs, automation technologies, de facto standards, and interoperability technologies. The goal is to reduce or remove the amount of human effort required to deploy virtual solutions by bringing every piece of the virtualization pie under central control. Configure policies and templates and let automation use those to guide the creation and configuration of your specific instances, so that everything is consistent.

But there’s a missing piece – a huge one – one that I’ve been saying for years. And that’s the application layer. When you come right down to it, the Exchange community gets into brawls with the virtualization community (and the networking community, and the storage community, but let’s stay focused on one brawl at a time please) because there are two different and incompatible principles at play:

  • Exchange is trying to be as aware of your data as possible and take every measure to keep it safe, secure, and available by making specific assumption about how the system is deployed and configured.
  • Your virtualization product is trying to treat all applications (including Exchange) as if they are completely unaware of the virtualization stack and provide features and functionality whether they were designed for it or not.

The various stack solutions are using the right approach, but I believe they are doing it in the wrong direction; they work great in the second scenario, but they create exceptions and oddities for Exchange and other programs like Exchange that fit the first scenario. So what’s missing? How do I think virtualization stacks need to fix this problem?

Create a standard by which Exchange and other applications can describe what capabilities they offer and define the dependencies and requirements for those capabilities that must in turn be provided by the stack. Only by doing this can policy-driven private cloud solutions close that gap and make policies extend across the entire stack, continuing to reduce the change for human error.

With a standard like this, virtualizing Exchange would become a lot easier. As an example, consider VM to host affinity. Instead of admins having to remember to manually configure Exchange virtual DAG members to not be on the same host, Exchange itself would report  this requirement to the virtualization solution. DAG Mailbox servers would never be on the same host, and the FSW wouldn’t be on the same host as any of the Mailbox servers. And when host outages resulted in the loss of redundant hosts, the virtualization solution could throw an event caught by the monitoring system that explained the problem before you got into a situation where this constraint was broken. But don’t stop there. This same standard could be applied to network configuration, allowing Exchange and other applications to have load balancing automatically provisioned by the private cloud solution.  Or imagine deploying Exchange mailbox servers into a VMware environment that’s currently using NFS. The minute the Mailbox role is deployed, the automation carves off the appropriate disk blocks and presents them as iSCSI to the new VM (either directly or through the hypervisor as an RDM, based on the policy) so that the storage meets Exchange’s requirements.

Imagine the arguments that could solve. Instead of creating problems, applications and virtualization/private cloud stacks would be working together — a very model of maturity.

The iPhone wars, concluded

This happened not too long after I posted my last iPhone update, but I forgot to blog it until now.

I made the decision to get rid of the iPhone. There were a few things I liked about it, but overall, I found the user experience for core behavior and integration was just nowhere near the level of excellence provided by Windows Phone. Yes, I could have probably solved the problems I found by purchasing additional apps – I noticed that for the most part, the better apps are not the free ones – but it wouldn’t have solved the larger problems of each piece being just a piece, not part of a larger hole.

So, I ditched it and replaced the necessary functionality with a 4G access point. I still have the tethering when necessary but now it’s not driving down my phone battery, I only have one device to handle – one that I like – and I still don’t need to pass out my personal cell number, by simply giving my customers the option to call my main Lync number and forward the call to my cell.

So it was interesting, but ultimately…iPhones aren’t for me.

Let go of Windows XP, Office 2003, and Exchange 2003

The day has come. s the end of an era, one that many people do not want to let go. I can understand that.

I drove my last car, a Ford Focus 2000, until it died in the summer of 2010. I loved that car, and we seriously considered replacing the engine (which would have been a considerable chunk of money we didn’t have) so we could keep it. In the end, though, we had to take a long hard look between finances and our family requirements, and we moved on to a new vehicle. It was the requirements portion that was the key. It was certainly cheaper to fix the immediate problem – the blown engine – and we had friends who could do it for us professionally but inexpensively.

However, our kids were getting older. The four-door mini-sedan model wasn’t roomy enough for us and all of our stuff if we wanted to take a longer road trip like we’d been talking about. If we wanted to get a new sofa, we had to ask a friend with a truck. It would be nice, we thought, to have some additional carrying capacity for friends, family, groceries, and the occasional find from Craigslist. We’d been limiting our activities to those that were compatible with our car. With the new vehicle, we found we had far greater options.

On the road again
On the road again

 

Two years ago we took the entire family on a 2-week road trip across the United States, camping along the way. Last summer, we took our family down to Crater Lake, the California Redwoods, and the Oregon Coast. We’ve been to the Olympic Rain Forest. I’ve hauled Scouts and their gear home from Jamboree shakedowns. We’ve moved. We’ve hauled furniture. In short, we’ve found that our forced upgrade, although being more expensive in the long run, also gave us far more opportunity in the long run.

I know many of you like Windows XP. For some crazy reason, I know there are still quite a few of you out there who love Office 2003 and refused to let it go. I even still run across Exchange 2003 on a regular basis. I know that there is a certain mindset that says, “We paid for it, it’s not going to wear out, so we’re just going to keep using it.” Consider, if you will, the following points:

  • Software doesn’t wear out, per se, but it does age out. You have probably already seen this in action. It’s not limited to software – new cars get features the old cars don’t. However, when a part for an old car breaks down, it’s a relatively simple matter for a company to make replacement parts (either by reverse-engineering the original, or licensing it from the original car-maker). In the software world, there is a significant amount of work revolved in back-porting code from the new version and running it backwards several versions. There’s programming time, there’s testing time, and there’s support time. 10 years is more than just about any other software company out there (get any paid Linux support company to give you 10-year support for one up-front price). Microsoft is not trying to scam more money out of you. They want you to move on and stay relatively current with the rest of the world.
  • You are a safety hazard for others. There has been plenty written about the dangers of running XP past the end of life. There are some really good guides on how to mitigate the dangers. But make no mistake – you’re only mitigating them. And in a networked office or home, your risk is exposing other people to danger as well. Don’t be surprised in a couple of months, after one or two well-publicized large-scale malware breakouts targeting these ancient editions, that your business partners, ISP, and other vendors take strong steps to protect their networks by shutting down your access. When people won’t vaccinate and get sick, quarantine is a reasonable and natural response. If you don’t want to be the attack vector or the weakest link, get off your moral high ground and upgrade your systems.
  • This is why you can’t have nice things. Dude, you’re still running Windows XP. The best you have to look forward to is Internet Explorer 8, unless you download Firefox, Chrome, or some other browser. And even those guys are only going to put up with jumping through the hoops required to make XP work for so long. News flash: few software companies like supporting their applications on an operating system (or application platform) that itself is unsupported. You’re not going to find better anti-virus software for that ancient Exchange 2003 server. You’re going to be lucky to continue getting updates. And Office 2003 plug-ins and files? Over the next couple of years, you’re going to enjoy more and more cases of files that don’t work as planned with your old suite. Don’t even think about trying to install new software and applications on that old boat. You’ve picked your iceberg.

Look, I realize there are reasons why you’ve chosen to stay put. They make sense. They make financial sense. But Microsoft is not going to relent, and this choice is not going to go away, and it’s not going to get cheaper. Right now you still have a small window of time when you will have tools to help you get your data to a newer system. That opportunity is going away faster than you think. It will probably, if past experience serves, cost you more to upgrade at this time next year than it does now.

So do the right thing. Get moving. If you need help, you know where to find us. Don’t think about all the things the new stuff does that you don’t need; think about all the ways you could be making your life easier.

The enemy’s gate is down: lessons in #Lync

Sometimes what you need is a change in perspective.

I started my IT career as a technician: desktops and peripherals, printers, and the parts of networks not involving the actual building and deployment of servers. I quickly moved into the systems and network administration role. After 9/11 and a 16-month gap in my employment status, I met these guys and moved my career into a radically different trajectory – one that would take me to places I’d never dreamed of. From there, I moved into traditional consulting.

There is a different mindset between systems administration (operation) and consulting (architecture): the latter guy designs and builds the system, while the former guy keeps it running. Think of it like building a house. The contracting team are the experts at what current code is, how to get a crew going and keep them busy, how to navigate the permit process, and all the other things you need when designing and building a house. The people who buy the house and live there, though, don’t need that same body of knowledge. They may be able to do basic repairs and maintenance, but for remodels they may need to get some expert help. However, they’re also the people who have to live day in and day out with the compromises the architect and builders made. Those particular design decisions may be played out over tens of houses, with neither the designer nor the builder aware that it’s ultimately a poor choice and that a different set of decisions would have been better.

I personally find it helpful to have feet in both worlds. One of the drawbacks I’d had in working at Trace3 is that I was moving steadily away from my roots in systems administration. With Cohesive Logic, I’m getting to step somewhat back in the systems role. What I’m remembering is that there is a certain mindset good systems administrators have: when faced with a problem, they will work to synthesize a solution, even if it means going off the beaten path. The shift from “working within the limitations” to “creatively working around the limitations” is a mental reorientation much like that described in Ender’s Game: in a zero-G battle arena, the title characters realizes that carrying his outside orientation into battle was a liability. By re-visualizing the enemy’s gate as being “down”, Ender changed the entire axis of the conflict in ways both subtle and profound – and turned his misfit team into an unstoppable army.

enemys-gate-is-down

Case in point: I wanted to get my OCS/Lync Tanjay devices working our Lync Server 2013 deployment. This involved getting the firmware upgraded, which ended up being quite a challenge. In the end, I managed to do something pretty cool – get a Tanjay device running 1.0.x firmware to upgrade to 4.0.x in one jump against a native Lync Server 2013 deployment – something many Lync people said wasn’t possible.

Here’s how I did it.

All it took was a mental adjustment. Falling is effortless – so aim yourself to fall toward success.

Windows 2012 R2 and #MSExchange: not so fast

Updated 9/18/2014: As of this writing, Windows Server 2012 R2 domain controllers are supported against all supported Microsoft Exchange environments:

  • Exchange Server 2013 with CU3 or later (remember, CU5 and CU6 are the two versions currently in support; SP1 is effectively CU4)
  • Exchange Server 2010 with SP3 and RU5 or later
  • Exchange Server 2007 with SP3 and RU13 or later

Take particular note that Exchange Server 2010 with SP2 (any rollup) and earlier are NOT supported with Windows Server 2012 R2 domain controllers.

Also note that if you want to enabled Windows Server 2012 R2 domain and forest functional level, you must have Exchange Server 2013 SP1 or later OR Exchange Server 2010 + SP3 + RU5 or later. Exchange Server 2013 CU3 and Exchange Server 2007 (any level) are not supported for these levels.

 

In the past couple of months since Windows Server 2012 R2 has dropped, a few of my customers have asked about rolling out new domain controllers on this version – in part because they’re using it for other services and they want to standardize their new build outs as much as they can.

My answer right now? Not yet.

Whenever I get a compatibility question like this, the first place I go is the Exchange Server Supportability Matrix on TechNet. Now, don’t let the relatively old “last update” time dismay you; the support matrix is generally only updated when major updates to Exchange (a service pack or new version) come out. (In case you haven’t noticed, Update Rollups don’t change the base compatibility requirements.)

Not this kind of matrix...

Not that kind of matrix…

If we look on the matrix under the Supported Active Directory Environments heading, we’ll see that as of right now Windows Server 2012 R2 isn’t even on the list! So what does this tell us? The same thing I tell my kids instead of the crappy old “No means No” chestnut: only Yes means Yes. Unless the particular combination you’re looking for is listed, then the answer is that it’s not supported at this time.

I’ve confirmed this by talking to a few folks at Microsoft – at this time, the Exchange requirements and pre-requisites have not changed. Are they expected to? No official word, but I suspect if there is a change we’ll see it when Exchange 2013 SP1 is released; that seems a likely time given they’ve already told us that’s when we can install Exchange 2013 on Windows 2012 R2.

In the meantime, if you have Exchange, hold off from putting Windows 2012 R2 domain controllers in place. Will they work? Probably, but you’re talking about untested schema updates and an untested set of domain controllers against a very heavy consumer of Active Directory. I can’t think of any compelling reasons to rush this one.

The iPhone Wars, Day 121

120 days later and I figured it was time for an update on the war.

First: I still hate this thing.

Somewhere along the way with one of the iOS updates, the battery life started going to crap, even when I’m barely using the device. When I use it as a personal hotspot, I can practically watch the battery meter race to zero.

I’ve nailed down what it is about the email client that I don’t like, and it’s the same thing I don’t like about many of the apps: the user interfaces are inconsistent and cramped. Navigating my way through a breadcrumb trail that is up near (but not quite) up at the top just feels clunky. This is where contrast with Windows Phone really, really hurts the iPhone in my experience; the Metro (I know, we’re not supposed to call it that anymore, but they can bite me) user interface principles are clean and clear. Trying to figure out simple tasks like how to get the iPhone to actually resync is more complex than necessary. Having the “new message” icon down on the bottom when the navigation is up top is stupid.

The one thing that impresses me consistently is even though the screen is small, the on-screen keyboard is really good at figuring out which letter I am trying to hit. On my Windows Phone I mistype things all the time. This rarely happens on the iPhone. Even though the on-screen keys are much smaller, the iPhone typing precision is much higher. Microsoft, take note – I’m tired of what feels like pressing on one key only to have another key grab the focus.

Even the few custom apps I do use on this iPhone fail to impress. Thanks to a lack of consistent design language, learning one doesn’t help me with the rest, and I have discovered that iPhone developers are just as bad as Windows Phone developers when it comes to inexplicable gaps in functionality.

I guess no one knows how to write good mobile software yet.

The iPhone Wars, Day 1

Part of the fun of settling into a new job is the new tools. In this trade, that’s the laptop and the cell phone. Now, I already have a perfectly good laptop and cell phone, so I probably could have just gone on using those, but where so much of what I do is from home, I find it important to have a clear break between personal business and work. Having separate devices helps me define that line.

My current cell phone is a Nokia Lumia 1020 (Windows Phone 8), which I definitely enjoy. I haven’t had a good chance to take the camera for a full spin, but I’m looking forward to it. I’ve had a lot of PDAs and smart phones in my time: Palm Pilot, Handspring Visor, Windows Mobile, BlackBerry, Windows Phone 7, even an Android. The one I’ve never had, though, is an iPhone.

And it’s not that I hate Apple. My favorite past laptop was my MacBook Pro (Apple has ruined me for any other touchpad). Granted, I’m that weird bastard who loaded Vista SP1 into Boot Camp and hardly ever booted back into Mac OS X again, but ever since then I’ve usually had a spare Apple computer around the house, if only for Exchange interop testing. OS X is a good operating system, but it’s not my favorite, so my main device is always a Windows machine. My current favorite is my Surface Pro.

In all of that, though, I’ve never had an iOS device. Never an iPhone, never an iPad. Yesterday, that all changed.

I needed a business smart phone that runs a specific application, one that hasn’t yet been ported to Windows Phone. I’ve long been an advocate that “apps matter first; pick your OS and platform after you know what apps you need.” Here was my opportunity not to be a shining hypocrite! After discussion with Jeremy, I finally settled on a iPhone 5, as Android was going to be less suitable for reasons too boring to go into.

So now I have an iPhone, and I have just one question for you iPhone-lovers of the world: You really like this thing? Honest to goodness, no one is putting a gun to your head?

I can’t stand this bloody thing! First, it’s too damn small! I mean, yes, I like my smart phones somewhat large, but I have big hands and I have pockets. The iPhone 5 is a slim, flat little black carbon slab with no heft. I’ve taken to calling it the CSD – the Carbon Suppository of Death. Now, if it were just the form factor, I could get used to it, but there’s so much more that I can’t stand:

  • I didn’t realize how much I love the Windows Phone customizable menu until I wasn’t using it. I forget who once called the iPhone (and Android) menu “Program Manager Reborn” but it’s totally apt. Plus, all the chrome (even in iOS 7) just feels cluttered and junky now.
  • Speaking of cluttered, Apple sometimes takes the minimalist thing too far. One button is not enough. This, I think, Windows Phone nails perfectly. Android’s four buttons feel extraneous, but Apple’s “let there be one” approach feels like dogma that won’t bow to practicality.
  • The last time I used an iPod, it was still black & white. I can’t stand iTunes as a music manager, and I don’t like the device-side interface – so I won’t be putting any music on the CSD. No advantage there.
  • Likewise, you think I’m going to dink around with the camera on the CSD when I have the glorious Lumia camera to use? Get real, human.
  • The on-screen keyboard sucks. Part of this is because the device is so much smaller, but part of it is that Apple doesn’t seem to understand little touches that improve usability. On Windows and Android, when you touch the shift key, the case of the letters on the keys changes correspondingly; Apple is all, “LOL…NOPE!”
  • Even the Mail client irritates me, even though I haven’t managed to put my finger on exactly why yet.

So is there anything I like about the device? Sure! I’m not a total curmudgeon:

  • Build quality looks impressive. If the CSD wasn’t as flimsy as a communion wafer, I would be blown away by the feel of the device. It’s got good clean lines and understated elegance, like that suit from the expensive Saville Row tailors.
  • Power usage. The CSD goes through battery very slowly. Now part of that is because I’m not using it, but Apple has had time to optimize their game, and they do it very well indeed.
  • The simple little physical switch to put the CSD into silent mode. This is exactly the kind of physical control EVERY smart phone should have, just like every laptop should have a physical switch to disable the radios (not just a hotkey combination).

This is where I’m at, with a fistful of suck. Even an Android phone would be better than this. I’ve got no-one to blame but myself, and it’s not going to get any better. So look forward to more of these posts from time to time as I find yet another aspect of the CSD that drives me crazy.

“But Devin,” I hear some of you Apple-pandering do-gooders say, “You’re just not used to it yet. Give it time. You’ll grow to love it.”

CHALLENGE ACCEPTED.

A Keenly Stupid Way To Lock Yourself Out of Windows 8

Ready for this amazing, life-changing technique? Lets go!

  1. Take a domain-joined Windows 8 computer.
  2. Logon as domain user 1.
  3. Notice that the computer name is a generic name and decide to rename it.
  4. Don’t reboot yet, because you have other tasks you want to do first.
  5. Switch users to domain user 2.
  6. Perform more tasks.
  7. Go to switch back to user 1. You can’t!
  8. Try to log back in as user 2. You can’t!

Good for hours of fun!

Defending a Bad Decision

I
t’s already started.

A bit over 12 hours after MSL’s cowardly decision to announce the end of the MCM program (see my previous blog post), we’re already starting to see a reaction from Microsoft on the Labor Day holiday weekend.

SQL Server MVP Jen Stirrup created an impassioned “Save MCM” plea on the Microsoft Connect site this morning at 6:19. Now, 7.5 hours later, it already has almost 200 votes of support. More importantly, she’s already gotten a detailed response from Microsoft’s Tim Sneath:

Thank you for the passion and feedback. We’re reading your comments and take them seriously, and as the person ultimately responsible for the decision to retire the Masters program in its current form, I wanted to provide a little additional context.

Firstly, you should know that while I’ve been accused of many things in my career, I’m not a “bean counter”. I come from the community myself; I co-authored a book on SQL Server development, I have been certified myself for nearly twenty years, I’ve architected and implemented several large Microsoft technology deployments, my major was in computer science. I’m a developer first, a manager second.

Deciding to retire exams for the Masters program was a painful decision – one we did not make lightly or without many months of deliberation. You are the vanguard of the community. You have the most advanced skills and have demonstrated it through a grueling and intensive program. The certification is a clear marker of experience, knowledge and practical skills. In short, having the Masters credential is a huge accomplishment and nobody can take that away from the community. And of course, we’re not removing the credential itself, even though it’s true that we’re closing the program to new entrants at this time.

The truth is, for as successful as the program is for those who are in it, it reaches only a tiny proportion of the overall community. Only a few hundred people have attained the certification in the last few years, far fewer than we would have hoped. We wanted to create a certification that many would aspire to and that would be the ultimate peak of the Microsoft Certified program, but with only ~0.08% of all MCSE-certified individuals being in the program across all programs, it just hasn’t gained the traction we hoped for.

Sure, it loses us money (and not a small amount), but that’s not the point. We simply think we could do much more for the broader community at this level – that we could create something for many more to aspire to. We want it to be an elite community, certainly. But some of the non-technical barriers to entry run the risk of making it elitist for non-technical reasons. Having a program that costs candidates nearly $20,000 creates a non-technical barrier to entry. Having a program that is English-only and only offered in the USA creates a non-technical barrier to entry. Across all products, the Masters program certifies just a couple of hundred people each year, and yet the costs of running this program make it impossible to scale out any further. And many of the certifications currently offered are outdated – for example, SQL Server 2008 – yet we just can’t afford to fully update them.

That’s why we’re taking a pause from offering this program, and looking to see if there’s a better way to create a pinnacle, WITHOUT losing the technical rigor. We have some plans already, but it’s a little too early to share them at this stage. Over the next couple of months, we’d like to talk to many of you to help us evaluate our certifications and build something that will endure and be sustainable for many years to come.

We hate having to do this – causing upset amongst our most treasured community is far from ideal. But sometimes in order to build, you have to create space for new foundations. I personally have the highest respect for this community. I joined the learning team because I wanted to grow the impact and credibility of our certification programs. I know this decision hurts. Perhaps you think it is wrong-headed, but I wanted to at least explain some of the rationale. It comes from the desire to further invest in the IT Pro community, rather than the converse. It comes from the desire to align our programs with market demand, and to scale them in such a way that the market demand itself grows. It comes from the desire to be able to offer more benefits, not fewer. And over time I hope we’ll be able to demonstrate the positive sides of the changes we are going through as we plan a bright future for our certifications.

Thank you for listening… we appreciate you more than you know.

First, I want to thank Tim for taking the time to respond on a holiday Saturday. I have no reason to think ill of him or disbelieve him in any way. That said, it won’t keep me from respectfully calling bullshit. Not to the details of Tim’s response (such as they are) not to the tone of his message, but rather to the worldview that it comes from.

First, this is the way the decision should have been announced to begin with, not that ham-fisted, mealy-mouthed thinly-disguised “sod off” piece of tripe that poor Shelby Grieve sent late last night. This announcement should have been released by the person who made the decision, taking full accountability for it, in the light of day, not pawned off to an underlying who was allowed to sneak it out at midnight Friday on a three-day holiday weekend.

Second, despite Tim’s claims of being a developer first, manager second, I believe he’s failing to account for the seductive echo-chamber mentality that permeates management levels at Microsoft. The fatal weakness of making decision by metrics is choosing the wrong metrics. When the Exchange program started the Ranger program (what later morphed to become the first MCM certification), their goal wasn’t reach into the community. It was reducing CritSits on deployments. It was increasing the quality of deployments to reduce the amount of downtime suffered by customers. This is one of the reasons I have been vocal in the past that having MSL take on 100% responsibility for the MCM program was a mistake, because we slowly but surely began losing the close coupling with the product group. Is the MCM program a failure by those metrics? Does the number of MCMs per year matter more than the actual impact that MCMs are making to Microsoft’s customers? This is hard stuff. Maybe, just maybe, having more than a tenth of a percent of all MCPs achieve this certification is the right thing if you’re focusing on getting the right people to earn it.

Third, MSL has shown us in the recent past that it knows how to transition from one set of certifications to another. When the MCITP and MCTS certification were retired, there was a beautiful, coordinated wave of information that came out showing exactly what the roadmap was, why things were changing, and what the new path would look like for people. We knew what to expect from the change. Shelby’s announcement gave us no hint of anything coming in the future. It was an axe, not a roadmap. It left no way for people who had just signed up (and paid money for the course fees, airplane tickets, etc.) to reach out and get answers to their questions. As far as we know, there may not be any refunds in the offing. I think it’s a bit early to be talking about lawyers, but several of my fellow MCMs don’t. All of this unpleasantness could have been avoided by making this announcement with even a mustard seed of compassion and projection. Right now, we’re left with promises that something will come to replace MCM. Those promises are right up on my hearth along with the promises that we just got made in recent months about new exams, new testing centers, and all the other promises the MCM program has made. This one decision and badly wrought communication has destroyed credibility and trust.

Fourth, many of the concerns Tim mentioned have been brought up internally in the MCM program before. The MCMs I went through my rotation with had lots of wonderful suggestions on how to approach solutions to these problems. The MCMs in my community have continued to offer advice and feedback. Most of this feedback has gone nowhere. It seems that somebody in between the trainers and the face people that we MCMs interact with and the folks at Tim’s level have been gumming up the communication. Ask any good intelligence analyst – sometimes you need to see the raw data instead of just the carefully processed work from the people below you in the food chain. Somewhere in that mass of ideas are good suggestions that probably could have been made to work to break down some of those non-technical barriers long before now, if only they’d gotten to the right level of management where someone had the power to do something about it. Again, in a metrics-driven environment, data that doesn’t light up the chosen metrics usually gets ignored or thrown out. There’s little profit taking the risk of challenging assumptions. Combine that with a distinct “not invented here” syndrome, and it feels like MSL has had a consistent pattern of refusing to even try to solve problems. Other tech companies have Master-level exams that don’t suffer too badly from brain dumps and other cheating measures. Why can’t Microsoft follow what they are doing and improve incrementally from there? I believe it’s because it requires investing even more money and time into these solutions, something that won’t give back the appropriate blips on the metrics within a single financial year.

So while I appreciate the fact that Tim took the time to respond (and I will be emailing him to let him know the existence of this post), I don’t believe that the only option MSL had was to do things in this fashion. And right now, that’s the impression I believe that this response is going to generate among an already angry MCM community.

Ain’t Nobody [at Microsoft Learning] Got Time For That

If you track other people in the Microsoft Certified Master blogosphere you’ve probably already heard about the shot to the face the MCM/MCSM/MCA/MCSA program (which I will henceforth refer to just as MCM for simplicity) took last night: a late Friday night email announcing the cancellation of these programs.

"Wait for it...wait for it..."

“Wait for it…wait for it…”

I was helping a friend move at the time, so check the email on my phone, pondered it just long enough to get pissed off, and then put it away until I had time and energy to deal with it today.

This morning, a lot of my fellow members of the Microsoft IT Pro community are reacting publicly. This list includes Microsoft employees, MCM trainers, MCMs, and MCM candidates:

 Others have already made all of the comments I could think to make — the seemingly deliberately bad timing, the total disconnect of this announcement with recent actions and announcements regarding the MCM availability, the shock and anger, all of it.

The only unique insight I seem to have to share is that this does *not* seem to be something that the product groups are on board with — it seems to be coming directly from Microsoft Learning and the higher-ups in that chain. Unfortunately, those of us who resisted and distrusted the move of MCM from being run by the product groups in partnership with MSL to the new regime of MSL owning all the MCM marbles (which inevitably led to less and less interaction with the actual product groups, with the predictable results) now seem to be vindicated.

I wish I’d been wrong. But even this move was called out by older and wiser heads than mine, and I discounted them at the time. Boy, was I wrong about that.

I’m really starting to think that as Microsoft retools itself to try to become a services and devices company, we’re going to see even more of these kind of measures (TechNet subs, MCM certs) that alienate the highly trained end of the IT Pro pool. After all, we’re the people who know how to design and implement on-premises solutions that you folks can run cheaper than Microsoft’s cloud offerings. Many of the competitors to Microsoft Consulting or to Microsoft hosted services had one or more MCMs on staff, and MCM training was a great viewpoint into how Office 365 was running their deployments. In essence, what had once been a valuable tool for helping sell Microsoft software licenses and reduce Microsoft support costs has now become, in the Cloud era, a way for competitors and customers to knowledgeably and authoritatively derail the Cloud business plans.

From that angle, these changes make a certain twisted sort of short-term sense — and with the focus on stock price and annual revenues, short-term sense is all corporate culture knows these days.

For what it’s worth, SQL Server MVP Jen Stirrup has started this Connect petition to try to save the MCM program. I wish him luck.

The Case for TechNet

By now, those of you who are my IT readers almost certainly know about Microsoft’s July 1st decision to retire the TechNet subscription offerings for IT professionals. In turn, Cody Skidmore put together a popular site to petition Microsoft to save TechNet subscriptions. Cody and many others have blogged about their reasons why they think that TechNet subscriptions need be be revived, rather than stick with Microsoft’s current plans to push Azure services, trial software, and expensive MSDN subscriptions as reasonable alternatives. I have put my name to this petition, as I feel that the loss of TechNet subscriptions is going to have a noticeable impact in the Microsoft ecosystem in the next few years.

I also hear a few voices loudly proclaiming that everything is fine. They generally make a few good points, but they all make a solitary, monumental mistake: they assume that everyone using TechNet subscriptions use them for the same things they do, in the same ways they do. Frankly, this myopia is insulting and stupid, because none of these reasons even begin to address why I personally find the impending loss of TechNet subscriptions to be not only irritating, but actively threatening to my ability to perform at my peak as an IT professional.

As a practicing consultant, I have to be an instant expert on every aspect of my customers’ Exchange environments, including the things that go wrong. Even when I’m on-site (which is rare), I usually don’t have unlimited access to the system; security rules, permissions, change control processes, and the need for uptime are all ethical boundaries that prevent me from running amok and troubleshooting wildly to my heart’s content. I can’t go cowboy and make whatever changes I need to (however carefully researched they may be) until I have worked out that those changes will in fact fix the problem and what the rollback process is going to be if things don’t work as expected.

Like many IT pros, I don’t have a ton of money to throw around at home. Because I have been working from home for most of the last few years, I have not even had access to my employer’s labs for hardware or software. I’ve been able to get around this with TechNet and virtualization. The value that TechNet provides as a reasonable price point is to give me full access to current and past versions of Microsoft software, updates, and patches, so I can replicate the customer’s environment in its essence, reduce the problem to the minimum steps for reproduction, and explore fixes or call in Microsoft Support for those times where it’s an actual bug with a workaround I can’t find. Demo versions of current software don’t help when I’m debugging interactions with legacy software, much of which rapidly becomes unavailable or at least extremely hard to find.

Microsoft needs to sit up and take notice; people like me save them money. Per-incident support pricing is not heinous, and it only takes a handful of hours going back and forth with the support personnel before it’s paid for itself from the customer’s point of view (I have no visibility into the economics on Microsoft’s side, but I suspect it is subsidized via software and license pricing overall). The thing is, though, Microsoft is a metric-driven company. If consultants and systems administrators no longer have a cost-effective source for replicating and simplifying problems, the obvious consequence I see is that Microsoft will see a rise in support cases, as well as a rise in the average time to resolve support cases, with the corresponding decrease in customer satisfaction.

Seriously, Microsoft – help us help you. Bring back TechNet subscriptions. They have made your software ecosystem one of the richest and healthiest of any commercial software company. Losing them won’t stem piracy of your products and won’t keep companies from using your software, but it will threaten to continue the narrative that Microsoft doesn’t care about its customers. And today more than ever, there are enough viable alternatives that you cannot take customer loyalty for granted.

Taking TechNet subscriptions away is a clear statement that Microsoft doesn’t trust its customers; customers will respond in kind. As the inevitable backlash to cloud services spreads in the wake of the NSA revelations, Microsoft will need all of the trust it can get. This is penny-wise, pound-foolish maneuvering at precisely the wrong time.

Finding Differences in Exchange objects (#DoExTip)

Many times, when I’m troubleshooting Exchange issues I need to compare objects (such as user accounts in Active Directory, or mailboxes) to figure out why there is a difference in behavior. Many times, the difference is tiny and hard-to-spot. It may not even be visible through the GUI.

To do this, I first dump the objects to separate text files. How I do this depends on the type of object I need to compare. If I can output the object using Exchange Management Shell, I pipe it through Format-List and dump it to text there:

Get-Mailbox –Identity Devin | fl > Mailbox1.txt

If it’s a raw Active Directory object I need, I use the built-in Windows LDP tool and copy and paste the text dump to separate files in a text editor.

Once the objects are in text file format, I use a text comparison tool, such as the built-in comparison tool in my preferred text editor (UltraEdit) or the standalone tool WinDiff.The key here is to quickly highlight the differences. Many of those differences aren’t important (metadata such as last time updated, etc.) but I can spend my time quickly looking over the properties that are different, rather than brute-force comparing everything about the different objects.

I can hear many of you suggesting other ways of doing this:

  • Why are you using text outputs even in PowerShell? Why not export to XML or CSV?
    If I dump to text, PowerShell displays the values of multi-value properties and other property types that it doesn’t show if I export the object to XML or CSV. This is very annoying, as the missing values are typically the source of the key difference. Also, text files are easy for my customers to generate, bundle, and email to me without any worries that virus scanners or other security policies might intercept them.
  • Why do you run PowerShell cmdlets through Format-List?
    To make sure I have a single property per line of text file. This helps ensure that the text file runs through WinDiff properly.
  • Why do you run Active Directory dumps through LDP?
    Because LDP will dump practically any LDAP property and value as raw text as I access a given object in Active Directory. I can easily walk a customer through using LDP and pasting the results into Notepad while browsing to the objects graphically, as per ADSIedit. There are command line tools that will export in other formats such as LDIF, but those are typically overkill and harder to use while browsing for what you need (you typically have to specify object DNs).
  • PowerShell has a Compare-Object cmdlet. Why don’t you use that for comparisons instead of WinDiff or text editors?
    First, it only works for PowerShell objects, and I want a consistent technique I can use for anything I can dump to text in a regular format. Second, Compare-Object changes its output depending on the object format you’re comparing, potentially making the comparison useless. Third, while Compare-Object is wildly powerful because it can hook into the full PowerShell toolset (sorting, filters, etc.) this complexity can eat up a lot of time fine-tuning your command when the whole point is to save time. Fourth, WinDiff output is easy to show customers. For all of these reasons, WinDiff is good enough.

Using Out-GridView (#DoExTip)

My second tip in this series is going to violate the ground rules I laid out for it, because they’re my rules and I want to. This tip isn’t a tool or script. It’s a pointer to an insanely awesome feature of Windows PowerShell that just happens to nicely solve many problems an Exchange administrator runs across on a day-to-day basis.

I only found out about Out-GridView two days ago, the day that Tony Redmond’s Windows IT Pro post about the loss of the Message Tracking tool hit the Internet. A Twitter conversation started up, and UK Exchange MCM Brian Reid quickly chimed in with a link to a post from his blog introducing us to using the Out-GridView control with the message tracking cmdlets in Exchange Management Shell.

This is a feature introduced in PowerShell 2.0, so Exchange 2007 admins won’t have it available. What it does is simple: take a collection of objects (such as message tracking results, mailboxes, public folders — the output of any Get-* cmdlet, really) and display it in a GUI gridview control. You can sort, filter, and otherwise manipulate the data in-place without having to export it to CSV and get it to a machine with Excel. Brian’s post walks you through the basics.

In just two days, I’ve already started changing how I interact with EMS. There are a few things I’ve learned from Get-Help Out-GridView:

  • On PowerShell 2.0 systems, Out-GridView is the endpoint of the pipeline. However, if you’re running it on a system with PowerShell 3.0 installed (Windows Server 2012), Out-GridView can be used to interactively filter down a set of data and then pass it on in the pipeline to other commands. Think about being able to grab a set of mailboxes, fine-tune the selection, and pass them on to make modifications without having to get all the filtering syntax correct in PowerShell.
  • Out-GridView is part of the PowerShell ISE component, so it isn’t present if you don’t have ISE installed or are running on Server Core. Exchange can’t run on Server Core, but if you want to use this make sure the ISE feature is installed.
  • Out-GridView allows you to select and copy data from the gridview control. You can then paste it directly into Excel, a text editor, or some other program.

This is a seriously cool and useful tip. Thanks, Brian!

Exchange Environment Report script (#DoExTip)

My inaugural DoExTip is a script I have been rocking out to and enthusiastically recommending to customers for over a year: the fantastic Exchange Environment Report script by UK Exchange MVP Steve Goodman. Apparently Microsoft agrees, because they highlight it in the TechNet Gallery.

It’s a simple script: run it and you get a single-page HTML report that gives you a thumbnail overview of your servers and databases, whether standalone or DAG. It’s no substitute for monitoring, but as a regular status update posted to a web page or emailed to a group (easily done from within the script) it’s a great touch point for your organization. Run it as a scheduled task and you’ll always have the 50,000 foot view of your Exchange health.

I’ve used it for migrations in a variety of organizations, from Exchange 2003 (it must be run on Exchange 2007 or higher) on up. I now consider this script an essential part of my Exchange toolkit.

Introducing DoExTips

At my house, we try to live our life by a well-known saying attributed to French philosopher Voltaire: “The perfect is the enemy of the good.” This is a translation from the second line of his French poem La Bégueule, which itself is quoting a more ancient Italian proverb. It’s a common idea that perfection is a trap. You may be more used to modern restatements such as the 80/20 rule (the last 20% of the work takes 80% of the effort).

I’ve had an idea for several years to fill what I see is a gap in the Exchange community. I’ve been toying with this idea for a while, trying to figure out the perfect way to do it. Today, I had a Voltaire moment: forget perfect.

So, without further ado, welcome to Devin on Exchange Tips (or #DoExTips for short). These are intended to be small posts that occur frequently, highlighting free scripts and tools that members of the global Exchange community have written and made available. There’s a lot of good stuff out there, and it doesn’t all come from Microsoft, and you don’t have to pay for it.

The tools and scripts I’ll highlight in DoExTips are not going to be finished products or polished. In many cases, they’ll take work to adapt to your environment. I’m going to quickly show you something I found that I’ve used as a starting point or spring board, not solve all your problems.

So, if you’ve got something you think should be highlighted as a DoExTip, let me know. (Don’t like the name? Blame Tom Clancy. I’ve been re-reading his Jack Ryan techno-thrillers and so military naming is on the brain.)

#MSExchange 2010 and .NET 4.0

Oh, Microsoft. By now, one might think that you’d learn not to push updates to systems without testing them thoroughly. One would be wrong. At least this one classifies as a minor annoyance and not outright breakage…

Windows Update offers up .NET 4.0 to Windows 2008 R2 systems as an Important update (and has been for a while). This is fine and good – various versions of the .NET framework can live in parallel. The problem, however, comes when you accept this update on an Exchange 2010 server with the CAS role.

If you do this, you may notice that the /exchange, /exchweb, and /public virtual directories (legacy directories tied to the /owa virtual directory) suddenly aren’t redirecting to /owa like they’re supposed to. Now, people aren’t normally using these directories in their OWA URLs anymore, but if someone does attempt to hit one of these virtual directories it leaves a gnarly error message to spam your event logs.

This is occurring because when .NET 4.0 is installed and the ASP.NET 4.0 components are tied into IIS, the Default Application Pool is reconfigured to use ASP.NET 4.0 instead of ASP.NET 2.0 (the version used by the .NET 3.5 runtime on Windows 2008 R2). What exactly it is about this that breaks these legacy virtual directories, I have no idea, but break them it does.

The fix for this is relatively simple: uninstall .NET 4.0 and hide the update from the machine so it doesn’t come back. If you don’t want to do that, follow this process outlined in TechNet to reset the Default Application Pool back to .NET 2.0. Be sure to run IISRESET afterwards.

Attached To You: Exchange 2010 Storage Essays, part 3

[2100 PST 11/5/2012: Edited to fix some typos and missing words/sentences.]

So, um…I knew it was going to take me a while to write this third part of the Exchange 2010 storage saga…but over two years? Damn, guys. I don’t even know what to say, other than to get to it.

So, we’ve this lovely little 3-dimension storage axis I’ve been talking about in parts 1 (JBOD vs. RAID) and 2 (SATA vs. SAS/FC). Part 3 addresses the third axis: SAN vs. DAS.

Exchange Storage DAS vs. SAN

What’s in a name?

It used to be that everyone agreed on the distinction between DAS, NAS, and SAN:

  • DAS was typically dumb or entry-level storage arrays that connected to a single (or at most two or three) servers via SCSI, SATA/SAS, or some other storage-specific cabling/protocol. DAS arrays typically had very little on-board smarts, other than the ability to run RAID configurations and present the RAID volumes to the connected server as if they were a single volume instead.
  • NAS was file-level storage presented over a network connection to servers. The two common protocols used were NFS (for Unix machines) and SMB/CIFS (for Windows machines). NAS solutions often include more functionality, including features such as direct interfaces with backup solutions, snapshots of the data volumes, replication of data to other units, and dynamic addition of storage.
  • SAN was high-end, expensive block-level storage presented over a separate network infrastructure such as FC or iSCSI over Ethernet. SAN systems offer even more features aimed at enterprise markets, including sophisticated disk partitioning and access mechanisms designed to achieve incredibly high levels of concurrence and performance.

As time passed and most vendors figured out that providing support for both file-level and block-level protocols made their systems more attractive by allowing them to be reconfigured and repurposed by their customers, the distinction between NAS and SAN began to blur. DAS, however, was definitely dumb storage. Heck, if you wanted to share it with multiple systems, you had to have multiple physical connections! (Anyone other than me remember those lovely days of using SCSI DAS arrays for poor man’s clustering by connecting two SCSI hosts – one with a non-default host ID – to the same SCSI chain?)

At any rate, it was all good. For Exchange 2003 and early Exchange 2007 deployments, storage vendors were happy because if you had more than a few hundred users, you almost certainly needed a NAS/SAN solution to consolidate the number of spindles required to meet your IOPS targets.

The heck you say!

In the middle of the Exchange 2007 era, Microsoft upset the applecart. It turns out that with the ongoing trend of larger mailboxes, Exchange 2007 SP1, CCR, and SCR, many customers were able to do something pretty cool: decrease the mailbox/database density to the point where (with Exchange 2007’s reduced IOPS) the total IOPS for their databases no longer required a sophisticated storage solution to provide the requisite IOPS. In general, disks for SAN/NAS units have to be of a higher quality and speed than for DAS arrays, so they typically had better performance and lower capacity than consumer-grade drives.

This trend only got more noticeable and deliberate in Exchange 2010, when Microsoft unified CCR and SCR into the DAG and moved replication to the application layer (as we discussed in Part 1). Microsoft specifically designed Exchange 2010 to be deployable on a direct-attached RAID-less 2TB SATA 7200 RPM drive to hold a database and log files, so they could scale hosted Exchange deployments up in an affordable fashion. Suddenly, Exchange no longer needed SAN/NAS units for most deployments – as long as you had sufficiently large mailboxes throughout your databases to reduce the IOPS/database ratio below the required amount.

Needless to say, storage vendors have taken this about as light-heartedly as a coronary.

How many of you have heard in the past couple of years the message that “SAN and DAS are the same thing, just different protocols”?

Taken literally, DAS and SAN are only differences in connectivity.

The previous quote is from EMC, but I’ve heard the same thing from NetApp and other SAN vendors. Ever notice how it’s only the SAN vendors who are saying this?

I call shenanigans.

If they were the same thing, storage vendors wouldn’t be spending so much money on whitepapers and marketing to try to convince Exchange admins (more accurately, their managers) that there was really no difference and that the TCO of a SAN just happens to be a better bet.

What SAN vendors now push are features like replication, thin provisioning, virtualization and DR integration, backup and recovery – not to mention the traditional benefits of storage consolidation and centralized management. Here’s the catch, though. From my own experience, their models only work IF and ONLY IF you continue to deploy Exchange 2010 the same way you deployed Exchange 2003 and Exchange 2007:

  • deploying small mailboxes that concentrate IOPS in the same mailbox database
  • grouping mailboxes based on criteria meant to maximize single instance storage (SIS)
  • planning Exchange deployments around existing SAN features and backup strategies
  • relying on third-party functionality for HA and DR
  • deploying Exchange 2010 DAGs as if they were a shared copy cluster

When it comes right down to it, both SAN and DAS deployments are technically (and financially) feasible solutions for Exchange deployments, as long as you know exactly what your requirements are and let your requirements drive your choice of technology. I’ve had too many customers who started with the technology and insisted that they had to use that specific solution. Inevitably, by designing around technological elements, you either have to compromise requirements or spend unnecessary energy, time, and money solving unexpected complications.

So if both technologies are viable solutions, what factors should you consider to help decide between DAS and SAN?

Storage Complexity

You’ve probably heard a lot of other Exchange architects and pros talk about complexity – especially if they’re also Certified Masters. There’s a reason for this – more complex systems, all else being equal, are more prone to system outages and support calls. So why do so many Exchange “pros” insist on putting complexity into the storage design for their Exchange systems when they don’t even know what that complexity is getting them? Yes, that’s right, Exchange has millennia of man-hours poured into optimizing and testing the storage system so that your critical data is safe under almost all conditions, and then you go and design storage systems that increase the odds the fsck-up fairy[1] will come dance with your data in the pale moonlight.

SANs add complexity. They add more system components and drivers, extra bits of configuration, and additional systems with their own operating system, firmware, and maintenance requirements. I’ll pick on NetApp for a moment because I’m most familiar with their systems, but the rest of the vendors have their own stories that hit most of the same high points:

  • I have to pick either iSCSI or FC and configure the appropriate HBA/NICs plus infrastructure, plus drivers and firmware. If I’m using FC I get expensive FC HBAs and switches to manage. If I go with iSCSI I get additional GB or 10GB Ethernet interfaces in my Exchange servers and the joy of managing yet another isolated set of network adapters and making sure Exchange doesn’t perform DAG replication over them.
  • I have to install the NetApp Storage Tools.
  • I have to install the appropriate MPIO driver.
  • I have to install the SnapDrive service, because if I don’t, the NetApp snapshot capability won’t interface with Windows VSS, and if I’m doing software VSS why the hell am I even using a SAN?
  • I *should* install SnapManager for Exchange (although I don’t have to) so that my hardware VSS backups happen and I can use it as an interface to the rest of the NetApp protection products and offerings.
  • I need to make sure my NetApp guy has the storage controllers installed and configured. Did I want redundancy on the NetApp controller? Upgrades get to be fun and I have to coordinate all of that to make sure they don’t cause system outage. I get to have lovely arguments with the NetApp storage guys about why they can’t just treat my LUNs the same way they treat the rest of them, yes I need my own aggregates and volumes and no please don’t give me the really expensive 15KRPM SAS drives that store a thimble because you’re going to make your storage guys pass out when they find out how many you need for all those LUNs and volumes (x2 because of your redundant DAG copies).[2]

Here’s the simple truth: SANs can be very reliable and stable. SANs can also be a single point of failure, because they are wicked expensive and SAN administrators and managers get put out with Exchange administrators who insist on daft restrictions like “give Exchange dedicated spindles” and “don’t put multiple copies of the same database on the same controller” and other party-pooping ways to make their imagined cost savings dwindle away to nothing. The SAN people have their own deployment best practices, just like Exchange people; those practices are designed to consolidate data for applications that don’t manage redundancy or availability on their own.

Every SAN I’ve ever worked with wants to treat all data the same way, so to make it reliable for Exchange you’re going to need to rock boats. This means more complexity (and money) and the SAN people don’t want complexity in their domain any more than you want it in yours. Unless you know exactly what benefits your solution will give you (and I’m not talking general marketing spew, I’m talking specific, realistic, quantified benefits), why in the world would you want to add complexity to your environment, especially if it’s going to start a rumble between the Exchange team and the SAN team that not even Jackie Chain and a hovercraft can fix?

Centralization and Silos

Over the past several years, IT pros and executives have heard a lot of talk about centralization. The argument for centralization is that instead of having “silos” or autonomous groups spread out, all doing the same types of things and repeating effort, you reorganize your operation so that all the storage stuff is handled by a single group, all the network stuff is handled by another group, and so on and so forth. This is another one of those principles and ideas that sounds great in theory, but can fall down in so many ways once you try to put it into practice.

The big flaw I’ve seen in most centralization efforts is that they end up creating artificial dependencies and decrease overall service availability. Exchange already has a number of dependencies that you can’t do anything about, such as Active Directory, networking, and other external systems. It is not wise to create even more dependencies when the Exchange staff doesn’t have the authority to deal with the problems those dependencies create but are still on the hook for them because the new SLAs look just like the old SLAs from the pro-silo regime.

Look, I understand that you need to realign your strategic initiatives to fully realize your operational synergies, but you can’t go do it half-assed, especially when you’re messing with business critical utility systems like corporate email. Deciding that you’re going to arbitrarily rearrange operations patterns without making sure those patterns match your actual business and operational requirements is not a recipe for long-term success.

Again, centralization is not automatically incompatible with Exchange. Doing it correctly, though, requires communication, coordination, and cross-training. It requires careful attention to business requirements, technical limitations, and operational procedures – and making sure all of these elements align. You can’t have a realistic 1-hour SLA for Exchange services when one of the potential causes for failure itself has a 4-hour SLA (and yes, I’ve seen this; holding Exchange metrics hostage to a virtualization group that has incompatible and competing priorities and SLAs makes nobody happy). If Exchange is critical to your organization, pulling the Exchange dependencies out of the central pool and back to where your Exchange team can directly operate on and fix them may be a better answer for your organization’s needs.

The centralization/silo debate is really just capitalism vs. socialism; strict capitalism makes nobody happy except hardcore libertarians, and strict socialism pulls the entire system down to the least common denominator[3]. The real answer is a blend and compromise of both principles, each where they make sense. In your organization, DAS and an Exchange silo just may better fit your business needs.

Management and Monitoring

In most Exchange deployments I’ve seen, this is the one area I consistently see neglected, so it doesn’t surprise me that it’s not more of an issue. Exchange 2010 does a lot to make sure the system stays up and operational, but it can’t manage everything. You need to have a good monitoring system in place and you need to have automation or well-written, thorough processes to handle dealing with common warnings and low-level errors.

One of the advantages of a SAN is that (at least on a storage level) much of this will be taken care of you. Every SAN system I’ve worked with not only built-in monitoring of state of the disks and the storage hardware, but has extensive integration with external monitoring systems. It’s really nice when at the same time you get notification that you’ve had a disk failure in the SAN that the SAN vendor has also been notified, so you know in the next day a spare will show up via FedEx (or even possibly brought by a technician who will replace it for you). This kind of service is not normally associated with DAS arrays.

However, even the SAN’s luscious – nay, sybaritic – level of notification luxury only protects you against SAN-level failures. SAN monitoring doesn’t know anything about Exchange 2010 database copy status or DAG cluster issues or Windows networking or RPC latency or CAS arrays or load balancer errors. Whether you deploy Exchange 2010 on a SAN or DAS offering, you need to have a monitoring solution that provides this kind of end-to-end view of your system. Low-end applications that rely on system-agnostic IP pings and protocol endpoint probes are better than nothing, but they aren’t a substitute for application-aware systems such as Microsoft System Center Operations Manager or some other equivalent that understand all of the components in an Exchange DAG and queries them all for you.

You also need to think about your management software and processes. Many environments don’t like having changes made to centralized, critical dependency systems like a SAN without going through a well-defined (and relatively lengthy) change management process. In these environments, I have found it difficult to get emergency disk allocations pushed through in a timely fashion.

Why would we need emergency disk allocations in an Exchange 2010 system? Let me give you a few real examples:

  • Exchange-integrated applications[4] cause database-level corruption that drives server I/O and RPC latency up to levels that affect other users.
  • Disk-level firmware errors cause disk failure or drops in data transfer rates. Start doing wide-scale disk replacements on a SAN and you’re going to drive system utilization through the roof because of all the RAID group rebuilds going on. Be careful which disks you pull at one time, too – don’t want to pull two or three disks out of the same RAID group and have the entire thing drop offline.
  • Somebody else’s application starts having disk problems. You have to move the upper management’s mailboxes to new databases on unaffected disks until the problems are identified and resolved.
  • A routine maintenance operation on one SAN controller goes awry, taking out half of the database copies. There’s a SAN controller with some spare capacity, but databases need to be temporarily consolidated so there is enough room for two copies of all the databases during the repair on the original controller.

Needless to say, with DAS arrays, you don’t have to tailor your purchasing, management, and operations of Exchange storage around other applications. Yes, DAS arrays have failures too, but managing them can be simpler when the Exchange team is responsible for operations end-to-end.

Backup, Replication, and Resilience

The big question for you is this: what protection and resilience strategy do you want to follow? A lot of organizations are just going on auto-pilot and using backups for Exchange 2010 because that’s how they’ve always done it. But do you really, actually need them?

No, seriously, you need to think about this.

Why do you keep backups for Exchange? If you don’t have a compelling technical reason, find the people who are responsible for the business reason and ask them what they really care about – is it having tapes or a specific technology, or is it the ability to recover information within a specific time window? If it’s the latter, then you need to take a hard look at the Exchange 2010 native data protection regime:

  • At least three database copies
  • Increased deleted item/deleted mailbox recovery limits
  • Recoverable items and hold policies
  • Personal archives and message retention
  • Lagged database copies

If this combination of functionality meets your needs, you need to take a serious look at a DAS solution. A SAN solution is going to be a lot more expensive for the storage options to begin with, and it’s going to be even more expensive for more than two copies. None of my customers deployed more than two copies on a SAN, because not only did they have to budget for the increased per-disk cost, but they would have to deploy additional controllers and shelves to add the appropriate capacity and redundancy. Otherwise, they’d have had multiple copies on the same hardware, which really defeats the purpose. At that point, DAS becomes rather attractive when you start to tally up the true costs of the native data protection solution.

So what do you do if the native data protection isn’t right for you and you need traditional backups? In my experience, one of the most compelling reasons for deploying Exchange on a SAN is the fantastic backup and recovery experience you get. In particular, NetApp’s snapshot-based architecture and SME backup application head the top of my list. SME includes a specially licensed version of the Ontrack PowerControls utility to permit single mailbox recovery, all tied back into NetApp’s kick-ass snapshots. Plus, the backups happen more quickly because the VSS provider is the NetApp hardware, not a software driver in the NTFS file system stack, and you can run the ESE verification off of a separate SME server to offload CPU from the mailbox servers. Other SAN vendors offer some sort of integrated backup option of some equivalency.

The only way you’re going to get close to that via DAS is if you deploy Data Protection Manager. And honestly, if you’re still replying on tape (or cloud) backups, I really recommend that you use something like DPM to stage everything to disk first so that backups from your production servers are staging to a fast disk system. Get those VSS locks dealt with as quickly as possible and offload the ESE checks to the DPM system. Then, do your tape backups off of the DPM server and your backup windows are no longer coupled to your user-facing Exchange servers. That doesn’t even mention DPM’s 15-minute log synchronization and use of deltas to minimize storage space on its own storage pool. DPM has a lot going for it.

A lot of SANs do offer synchronous and asynchronous replication options, often at the block level. These sound like good options, especially to enhance site resiliency, and for other applications, they often can be. Don’t get suckered into using them for Exchange, though, unless they are certified to work against Exchange (and if it’s asynchronous replication, it won’t be). A DAS solution doesn’t offer this functionality, but that’s no loss in this column; whether you’re on SAN or DAS, you should be replicating via Exchange. Replicating using the SAN block-level replication means that the replication is happening without Exchange being aware of it, which means depending on when a failure happens, you could in the worst case end up with a corrupted database replica volume. Best case, your SAN-replicated database will not be in a consistent state, so you will have to run ESEUTIL to perform a consistency check and play log files forward before mounting that copy. If you’re going to that, why are you running Exchange 2010?

Now if you need a synchronous replication option, Exchange 2010 includes an API to allow a third-party provider to replace the native continuous replication capability. As far as I know, only one SAN vendor (EMC) has taken advantage of this option, so your options are pretty clear in this scenario.

Conclusion

We’ve covered a lot of ground in this post, so if you’re looking for a quick take-away, the answer is this:

Determine what your real requirements are, and pick your technology accordingly. Whenever possible, don’t make choices by technology or cost first without having a clear and detailed list of expected benefits in hand. You will typically find some requirement that makes your direction clear.

If anyone tells you that there’s a single right way to do it, they’re probably wrong. Having said that, though, the more I’ve seen over the past couple of years, the more people deviate from the Microsoft sweet spot, the more design compromises they’ve made when perhaps they didn’t have to. Inertia and legacy have their place but need to be balanced with innovation and reinvention.

[1] Not a typo, I’m just showing off my Unix roots. The fsck utility (file system check) helps fix inconsistencies in the Unix file systems. Think chkdsk.

[2] Can you tell I’ve been in this rodeo once or twice? But I’m not bitter. And I do love NetApp because of SME, I just realize it’s not the right answer for everyone.

[3] Yes, I did in fact just go there. Blame it on the nearly two years of political crap we’ve suffered in the U.S. for this election season. November 6th can’t come soon enough.

[4] The application in this instance was an older version of Microsoft Dynamics CRM, very behind on its patches. There was a nasty calendar corruption bug that made my customer’s life hell for a while. The solution was to upgrade CRM to the right patch level, then move all of the affected mailboxes (about 40% of the users) to new databases. We didn’t need to have a lot of new databases, as we could move them in a swing fashion, but in order to get it done in a timely fashion we needed to provision enough LUNs to have enough databases and copies that we could get the process done in a timely fashion. Each swing cycle took about two weeks because of change management when we could have gotten it done much sooner.

Can You Fix This PF Problem?

Today I got to chat with a colleague who was trying to troubleshoot a weird Exchange public folder replication problem. The environment, which is the middle of an Exchange 2007 to Exchange 2010 migration, uses public folders heavily – many hundreds of top-level public folders with a lot of sub-folders. Many of these public folders are mail-enabled.

After replicating creating public folder replicas on Exchange 2010 public folder databases and ensuring that the public folders were starting to replicate, my colleague received notice that specific mail-enabled public folders weren’t getting incoming mail content. Lo and behold, the HT queues were full of thousands of public folder replication messages, all queued up.

After looking at the event logs and turning up the logging levels, my colleague noticed that they were seeing a lot of the 4.3.2 STOREDRV.Deliver; recipient thread limit exceeded error message mentioned in the Microsoft Exchange team blog post Store Driver Fault Isolation Improvements in Exchange 2010 SP1. Adding the RecipientThreadLimit key and setting it to a higher level helped temporarily, but soon the queues would begin backing up again.

At that point, my colleague called me for some suggestions. We talked over a bunch of things to check and troubleshooting trees to follow depending on what he found. Earlier tonight, I got an email confirming the root cause was identified. I was not surprised to find out that the cause turned out to be something relatively basic. Instead of just telling you what it was though, I want you to tell me which of the following options YOU think it is. I’ll follow up with the answer on Monday, 10/15.

Which of the following options is the root cause of the public folder replication issues?

Forced Obsolescence

ZDNet’s David Meyer noted earlier today that Google is about to shut down support for exporting the legacy Microsoft Office file formats (.doc, .xls, and .ppt) from Google Apps as of October 1, 2012. The Google blog notes that Google Apps users will still be able to import data from those formats. However, if they want Office compatibility, they need to export to the Office 2007 formats (.docx, .xlsx, and .pptx).

When Office 2007 was still in beta back in 2006, Microsoft released optional patches to Office 2003 to allow it to open and save the new file formats. Over time, these patches got included in Windows Update, so if you still have Office 2003 but have been updating, you probably have this capability today. Office 2003 can’t open these newer documents with 100% fidelity, but it’s good enough to get the job done. And if you’re on earlier versions of Office for Word, Microsoft hasn’t forgotten you; Office 2000 and Office XP (2002) users can also download the Compatibility Pack.

What boggles me are some of the comments on the ZDNet article. I can’t understand why anyone would think this was a bad idea:

  • The legacy formats are bloated and ill-defined. As a result, files saved in those format are more prone to corruption over the document lifecycle, not to mention when moving through various import/export filters. Heck, just opening them in different versions of Word can be enough to break the files.
  • The legacy formats are larger — much larger — than the new formats. Between the use of standard ZIP compression (the new format documents are actually an archive file containing a whole folder/file structure inside) along with smart use of XML rather than proprietary binary data, the new formats can pack a lot more data into the same space. Included picture files, for example, can be stored in compressible formats rather than as space-hogging uncompressible bitmaps.
  • The legacy formats are safer. Macro information is safely stored away from the actual data in the file, and Office (at least) can block the loading and saving of macro information from a variant of these files.

For many companies it would simply be cost-prohibitive to convert legacy files into the new formats…but it might not be a bad idea for critical files. Nowadays, I personally try to make sure I’m only writing new format Office files unless the people I am working with specifically ask for one of the legacy formats. I’m glad to see that Google is doing the right thing in helping make these legacy formats nothing more than a historical footnote — and I’d love to see Microsoft remove write support for them in Office 2013.

And @marypcbuk Nails IT

Amid all the bustle of MEC, I’ve not taken a bunch of time to read my normal email, blogs, etc. However, this article from ZDNet caught my eye:

Windows 8: Why IT admins don’t know best by Mary Branscombe

The gist of it is that IT departments spend a lot of time and effort trying to stop users from doing things with technology when they would often be better served enabling users. Users these days are not shy about embracing new technology, and Mary argues that users find creative ways around IT admins who are impediments:

The reality is that users are pushing technology in the workplace — and out of it. The Olympics has done more to advance flexible and remote working than a decade of IT pilot projects.

What got her going is the tale of an IT admin who found a way to disable, via Group Policy, the short tutorial that users are given on navigating Windows 8 the first time they log on.

I see this behavior all the time from admins and users – admins say “No” and users say “Bet me.” Users usually win this fight, too, because they are finding ways to get their work done. A good admin doesn’t say “No” – they say, “Let me help you find the best way to get that done.”

Mary finishes with this timely reminder:

See something new in Windows 8? If your first impulse is to look for a way to turn it off, be aware that you’re training your users to work around you.

What a refreshing dose of common sense.

TMG? Yeah, you knew me!

Microsoft today officially announced a piece of news that came as very little surprise to anyone who has been paying attention for the last year. On May 25th of 2011, Gartner broke an unsubstantiated claim that they had been told by Microsoft that there would be no future release of Forefront Threat Management Gateway (TMG).

Microsoft finally confirmed that information. Although the TMG product will receive mainstream support until April 14, 2015 (a little bit more than 2.5 years from time of writing), it will no longer be available for sale come December 1, 2012.

Why do Exchange people care? Because TMG was the simple, no-brainer solution for environments that needed a reverse proxy in a DMZ network. Many organizations can’t allow incoming connections from the Internet to cross into an interior network. TMG provided protocol-level inspection and NAT out of the box, and could be easily configured for service-aware CAS load balancing and pre-authentication. As I said, no-brainer.

TMG had its limitations, though. No IPv6 support, poor NAT support, and an impressively stupid inability to proxy all non-HTTP protocols in a one-armed configuration. The “clustered” enterprise configuration was sometimes a pain-in-the ass to troubleshoot and work with when the central configuration database broke (and it seemed more fragile than it should be).

The big surprise for me is that TMG shares the chopping block with the on-server Forefront protection products for Exchange, SharePoint, and Lync/OCS. I personally have had more trouble than I care for with the Exchange product — it (as you might expect) eats up CPU like nobody’s business, which made care and feeding of Exchange servers harder than it needed to be. Still, to only offer online service — that’s a telling move.

Duke of URL

Just a quick note to let you know about a change or two I’ve made around the site.

  • Changed the primary URL of the site from www.thecabal.org to www.devinonearth.com. This is actually something I’ve been wanting to do for a long time, to reflect the site’s really awesome branding. Devin on Earth has long been its own entity that has no real connection to my original web site.
  • Added a secondary URL of www.devinganger.com to the site. This is a nod toward the future as I get fiction projects finished and published – author domains are a good thing to have, and I’m lucky mine is unique. Both www.devinganger.com and www.thecabal.org will keep working, so no links will ever go stale.

As a final aside, this is the 600th post on the site. W00t!

My Five Favorite Features of Exchange Server 2013 Preview

Exchange Server 2013 Preview was released a few weeks ago to give us a first look at what the future holds in store for Exchange. I got a couple of weeks to dig into it in depth and so here’s my quick impression of the five changes I like the most about Exchange 2013.

  1. Client rendering is moved from the Client Access role to the Mailbox role. (TechNet) Yes, this means some interesting architectural changes to SMTP, HTTP, and RPC, but I think it will help spread load out to where it should be – the server that host active users’ mailboxes.
  2. The Client Access role is now a stateless proxy. (TechNet) This means we no longer need an expensive L7 load balancer with all sorts of fancy complicated session cookies in our HTTP/HTTPS sessions. It means a simple L4 load balancer is enough to scale the load for thousands of users based solely on source IP and port. No SSL offload required!
  3. The routing logic now recognizes DAG boundaries. (TechNet) This is pretty boss – members of a DAG that are spread across multiple sites will still act as if they were local when routing messages to each other. It’s almost like the concept of routing groups has come back in a very limited way.
  4. No more MAPI-RPC over TCP. (TechNet) Seriously. Outlook Anywhere (aka RPC over HTTPS) is where it’s at. As a result, Autodiscover for clients is mandatory, not just a really damn good idea. Firewall discussions just got MUCH easier. Believe it or not, this simplifies namespace and certificate planning…
  5. Public folders are now mailbox content. (TechNet) Instead of having a completely separate back-end mechanism for public folders, they’re now put in special mailboxes. Yes, this means they are no longer multi-master…but honestly, that causes more angst than it solves in most environments. And now SharePoint and other third-party apps can get to public folder content more easily…

There are a few things I’m not as wild about, but this is a preview and there’s no point kvetching about a moving target. We’ll see how things shake down.

I’m looking forward to getting a deeper dive at MEC in a couple of weeks, where I’ll be presenting a session on lessons learned in virtualizing Exchange 2010. Are you planning on attending?

Have you had a chance to play with Exchange 2013 yet, or at least read the preview documentation? What features are your favorite? What changes have you wondering about the implications? Send me an email or comment and I’ll see if I can’t answer you in a future blog post!

Can’t make a bootable USB stick for Windows 8? Join the club!

I was trying to make a bootable USB stick for Windows 8 this morning, using the Windows 7 USB/DVD Download Tool from Microsoft and the process outlined in this Redmond Pie article (the same basic steps can be found in a number of places). Even though the tool originated for Windows 7 and the steps I linked to are for the Windows 8 Consumer Preview, it all still works fine with Windows 8 RTM.

The steps are pretty simple:

  1. Download and install the tool.
  2. Download the ISO image of the version of Windows you want to install (Windows 7 and 8 for sure, I believe it works with Windows Server 2008 R2 and Windows Server 2012 RC as well).
  3. Plug in a USB stick (8GB or larger recommended) that is either blank or has no data on it you want to keep (it will be reformatted as part of the process).
  4. Run the tool and pick the ISO image.
  5. Select the USB drive (note that this tool can also burn the ISO to DVD).
  6. Wait for the tool to reformat the USB stick, copy the ISO contents to the stick, and make it bootable.

Everything was going fine for me until I got to step 6. The tool would format the USB stick, and then it would immediately fail before beginning the file copy:

DownloadToolError

Redmond, we have a problem…

At first I was wondering if it was related to UAC (it wasn’t) or a bad ISO image (it wasn’t). So I plugged the appropriate search terms into Bing and away we went, where I finally found this thread on the TechNet forums, which led me to this comment in the thread (wasn’t even marked as the solution, although it sure should have been):

We ran across this same "Error during backup., Usb; Unable to set active partition. Return code 87" with DataStick Pro 16 GB USB sticks. The Windows 7 DVD/USB Download Tool would format and then fail as soon as the copy started.

We ended up finding that the USB stick has a partition that starts at position 0 according to DiskPart. We used DiskPart to select the disk that was the USB, then ran Clean, then created the partition again. This time it was at position 1024. The USB stick was removed then reinserted and Windows prompted to format the USB stick, answer Yes.

The Windows 7 DVD/USB Download Tool was now able to copy files.

So, here’s the process I followed:

DiskPartUSBFix

Follow my simple step-by-step instructions. I make hacking FUN!

To do it yourself, launch a command window (either legacy CMD or PowerShell, doesn’t matter) with Administrator privileges and type diskpart to fire up the tool:

  1. LIST DISK gives a listing of all the drives attached to the system. At this point, no disk is selected.
  2. I have a lot of disks here, in part because my system includes an always-active 5-in-1 card reader (disks 1 through 5 that say no media). I also have an external USB hard drive (230GB? How cute!) at disk 6. Disk 7, however — that’s the USB stick. Note that the "free" column is *not* showing free space on the drive in terms of file system — it’s showing free space that isn’t allocated to a partition/volume.
  3. Diskpart, like a lot of Microsoft command-line tools, often requires you to select a specific item for focus, at which point other commands that you run will then run against the currently focused object. Use SELECT DISK to set the focus on your USB stick.
  4. Now that the USB stick has focus, the LIST PART command will run against the selected disk and show us the partitions on that disk.
  5. Uh-oh. This is a problem. With a zero-byte offset on that partition (USB sticks typically only have a single partition) that means there’s not enough room for that partition to be marked bootable and for the boot loader to be put on the disk. The volume starts at the first available byte. Windows needs a little bit of room — typically only one megabyte — for the initial boot loader code (which then jumps into the boot code in the bootable disk partition).
  6. So, let’s use CLEAN to nuke the partitions and restore this USB stick to a fully blank state.
  7. Use LIST PART again (still focused on the disk object) confirms that we’ve removed the offending partition. You can create a new partition in diskpart but I happened to have the Disk Manager MMC console open already as part of my troubleshooting, so that’s what I used to create the new partition.
  8. Another LIST PART to confirm that everything is the way it should be…
  9. Yup! Notice we have that 1 MB offset in place now. There’s now enough room at the start of the USB stick for the boot loader code to be placed.
  10. Use EXIT to close up diskpart.

This time, when I followed the steps with the Download Tool, the bootable USB stick was created without further ado. Off to install Windows 8!

Beating Verisign certificate woes in Exchange

I’ve seen this problem in several customers over the last two years, and now I’m seeing signs of it in other places. I want to document what I found so that you can avoid the pain we had to go through.

The Problem: Verisign certificates cause Exchange publishing problems

So here’s the scenario: you’re deploying Exchange 2010 (or some other version, this is not a version-dependent issue with Exchange) and you’re using a Verisign certificate to publish your client access servers. You may be using a load balancer with SSL offload or pass-through, a reverse proxy like TMG 2010, some combination of the above, or you may even be publishing your CAS roles directly. However you publish Exchange, though, you’re running into a multitude of problems:

  • You can’t completely pass ExRCA’s validation checks. You get an error something like:  The certificate is not trusted on any version of Windows Phone device. Root = CN=VeriSign Class 3 Public Primary Certification Authority – G5, OU=”(c) 2006 VeriSign, Inc. – For authorized use only”, OU=VeriSign Trust Network, O=”VeriSign, Inc.”, C=US
  • You have random certificate validation errors across a multitude of clients, typically mobile clients such as smartphones and tablets. However, some desktop clients and browsers may show issues as well.
  • When you view the validation chain for your site certificate on multiple devices, they are not consistent.

These can be very hard problems to diagnose and fix; the first time I ran across it, I had to get additional high-level Trace3 engineers on the call along with the customer and a Microsoft support representative to help figure out what the problem was and how to fix it.

The Diagnosis: Cross-chained certificates with an invalid root

So what’s causing this difficult problem? It’s your basic case of a cross-chained certificate with an invalid root certificate. “Oh, sure,” I hear you saying now. “That clears it right up then.” The cause sounds esoteric, but it’s actually not hard to understand when you remember how certificates work: through a chain of validation. Your Exchange server certificate is just one link in an entire chain. Each link is represented by an X.509v3 digital certificate that is the footprint of the underlying server it represents.

At the base of this chain (aka the root) is the root certificate authority (CA) server. This digital certificate is unique from others because it’s self-signed – no other CA server has signed this server’s certificate. Now, you can use a root CA server to issue certificates to customers, but that’s actually a bad idea to do for a lot of reasons. So instead, you have one or more intermediate CA servers added into the chain, and if you have multiple layers, then the outermost layer are the CA servers that process customer requests. So a typical commercially generated certificate has a validation chain of 3-4 layers: the root CA, one or two intermediate CAs, and your server certificate.

Remember how I said there were reasons to not use root CAs to generate customer certificates? You can probably read up on the security rationales behind this design, but some of the practical reasons include:

  • The ability to offer different classes of service, signed by separate root servers. Instead of having to maintain separate farms of intermediate servers, you can have one pool of intermediate servers that issue certificates for different tiers of service.
  • The ability to retire root and intermediate CA servers without invalidating all of the certificates issued through that root chain, if the intermediate CA servers cross-chain from multiple roots. That is, the first layer intermediate CA servers’ certificates are signed by multiple root CA servers, and the second layer intermediate CA servers’ certificates are signed by multiple intermediate CA servers from the first layer.

So, cross-chaining is a valid practice that helps provide redundancy for certificate authorities and helps protect your investment in certificates. Imagine what a pain it would be if one of your intermediate CAs got revoked and nuked all of your certificates. I’m not terribly fond of having to redeploy certificates for my whole infrastructure without warning.

However, sometimes cross-chained certificates can cause problems, especially when they interact with another feature of the X.509v3 specification: the Authority Information Access (AIA) certificate extension. Imagine a situation where a client (such as a web browser trying to connect to OWA), presented with an X.509v3 certificate for an Exchange server, cannot validate the certificate chain because it doesn’t have the upstream intermediate CA certificate.

If the Exchange server certificate has the AIA extension, the client has the information it needs to try to retrieve the missing intermediate CA certificate – either retrieving it from the HTTPS server, or by contacting a URI from the CA to download it directly. This only works for intermediate CA certificates; you can’t retrieve the root CA certificate this way. So, if you are missing the entire certificate chain, AIA won’t allow you to validate it, but as long as you have the signing root CA certificate, you can fill in any missing intermediate CA certificates this way.

Here’s the catch: some client devices can only request missing certificates from the HTTPS server. This doesn’t sound so bad…but what happens if the server’s certificate is cross-chained, and the certificate chain on the server goes to a root certificate that the device doesn’t have…even if it does have another valid root to another possible chain? What happens is certificate validation failure, on a certificate that tested as validated when you installed it on the Exchange server.

I want to note here that I’ve only personally seen this problem with Verisign certificates, but it’s a potential problem for any certificate authority.

The Fix: Disable the invalid root

We know the problem and we know why it happens. Now it’s time to fix it by disabling the invalid root.

Step #1 is find the root. Fire up the Certificates MMC snap-in, find your Exchange server certificate, and view the certificate chain properties. This is what the incorrect chain has looked like on the servers I’ve seen it on:

image

The invalid root CA server circled in red

That’s a not very helpful friendly name on that certificate, so let’s take a look at the detailed properties:

image

Meet “VeriSign Class 3 Public Primary Certification Authority – G5”

Step #2 is also performed in the Certificates MMC snap-in. Navigate to the Third-Party Root Certification Authorities node and find your certificate. Match the attributes above to the certificate below:

image

Root CA certificate hide and seek

Right-click the certificate and select Properties (don’t just open the certificate) to get the following dialog, where you will want to select the option to disable the certificate for all purposes:

image

C’mon…you know you want to

Go back to the server certificate and view the validation chain again. This time, you should see the sweet, sweet sign of victory (if not, close down the MMC and open it up again):

image

Working on the chain gang

It’s a relatively easy process…so where do you need to do it? Great question!

The process I outlined obviously is for Windows servers, so you would think that you can fix this just on the the Exchange CAS roles in your Internet-facing sites. However, you may have additional work to do depending on how you’re publishing Exchange:

  • If you’re using a hardware load balancer with the SSL certificate loaded, you may not have the ability to disable the invalid root CA certificate on the load balancer. You may simply need to remove the invalid chain, re-export the correct chain from your Exchange server, and reinstall the valid root and intermediate CA certificates.
  • If you’re publishing through ISA/TMG, perform the same process on the ISA/TMG servers. You may also want to re-export the correct chain from your Exchange server onto your reverse proxy servers to ensure they have all the intermediate CA certificates loaded locally.

The general rule is that the outermost server device needs to have the valid, complete certificate chain loaded locally to ensure AIA does its job for the various client devices.

Let me know if this helps you out.

Exchange 2010 virtualization storage gotchas

There’s a lot of momentum for Exchange virtualization. At Trace3, we do a lot of work with VMware, so the majority of the customers I work with already have VMware deployed strategically into their production operation model. As a result, we see a lot of Exchange 2010 under VMware. With Exchange 2010 SP1 and lots of customer feedback, the Exchange product team has really stepped up to provide better support for virtual environments as well as more detailed guidance on planning for and deploying Exchange 2007 and 2010 in virtualization.

Last week, I was talking with a co-worker about Exchange’s design requirements in a virtual environment. I casually mentioned the “no file-level storage protocols” restriction for the underlying storage and suddenly, the conversation turned a bit more serious. Many people who deploy VMware create large data stores on their SAN and share them to the ESX cluster via the NFS protocol. There are a lot of advantages to doing it this way, and it’s a very flexible and relatively easy way to deploy VMs. However, it’s not supported for Exchange VMs.

The Heck You Say?

“But Devin,” I can hear some of you say, “what do you mean it’s not supported to run Exchange VMs on NFS-mounted data stores? I deploy all of my virtual machines using VMDKs on NFS-mounted data stores. I have my Exchange servers there. It all works.”

It probably does work. Whether or not it works, though, it’s not a supported configuration, and one thing Masters are trained to hate with a passion is letting people deploy Exchange in a way that gives them no safety net. It is an essential tool in your toolkit to have the benefit of Microsoft product support to walk you through the times when you get into a strange or deep problem.

Let’s take a look at Microsoft’s actual support statements. For Exchange 2010, Microsoft has the following to say in http://technet.microsoft.com/en-us/library/aa996719.aspx under virtualization (emphasis added):

The storage used by the Exchange guest machine for storage of Exchange data (for example, mailbox databases or Hub transport queues) can be virtual storage of a fixed size (for example, fixed virtual hard disks (VHDs) in a Hyper-V environment), SCSI pass-through storage, or Internet SCSI (iSCSI) storage. Pass-through storage is storage that’s configured at the host level and dedicated to one guest machine. All storage used by an Exchange guest machine for storage of Exchange data must be block-level storage because Exchange 2010 doesn’t support the use of network attached storage (NAS) volumes. Also, NAS storage that’s presented to the guest as block-level storage via the hypervisor isn’t supported.

Exchange 2007 has pretty much the same restrictions as shown in the http://technet.microsoft.com/en-us/library/bb738146(EXCHG.80).aspx TechNet topic. What about Exchange 2003? Well, that’s trickier; Exchange 2003 was never officially supported under any virtualization environment other than Microsoft Virtual Server 2005 R2.

The gist of the message is this: it is not supported by Microsoft for Exchange virtual machines to use disk volumes that are on file-level storage such as NFS or CIFS/SMB, if those disk volumes hold Exchange data. I realize this is a huge statement, so let me unpack this a bit. I’m going to assume a VMware environment here, but these statements are equally true for Hyper-V or any other hypervisor supported under the Microsoft SVVP.

While the rest of the discussion will focus on VMware and NFS, all of the points made are equally valid for SMB/CIFS and other virtualization system. (From a performance standpoint, I would not personally want to use SMB for backing virtual data stores; NFS, in my experience, is much better optimized for the kind of large-scale operations that virtualization clusters require. I know Microsoft is making great strides in improving the performance of SMB, but I don’t know if it’s there yet.

It’s Just Microsoft, Right?

So is there any way to design around this? Could I, in theory, deploy Exchange this way and still get support from my virtualization vendor? A lot of people I talk to point to a whitepaper that VMware published in 2009 that showed the relative performance of Exchange 2007 over iSCSI, FC, and NFS. They use this paper as “proof” that Exchange over NFS is supported.

Not so much, at least not with VMware. The original restriction may come from the Exchange product group (other Microsoft workloads are supported in this configuration), but the other vendors certainly know the limitation and honor it in their guidance. Look at VMware’s Exchange 2010 best practices at http://www.vmware.com/files/pdf/Exchange_2010_on_VMware_-_Best_Practices_Guide.pdf on page 13:

It is important to note that there are several different shared-storage options available to ESX (iSCSI, Fibre Channel, NAS, etc.); however, Microsoft does not currently support NFS for the Mailbox Server role (clustered or standalone). For Mailbox servers that belong to a Database Availability Group, only Fibre Channel is currently supported; iSCSI can be used for standalone mailbox servers. To see the most recent list of compatibilities please consult the latest VMware Compatibility Guides.

According to this document, VMware is even slightly more restrictive! If you’re going to use RDMs (this section is talking about RDMs, so don’t take the iSCSI/FC statement as a limit on guest-level volume mounts), VMware is saying that you can’t use iSCSI RDMs, only FC RDMs.

Now, I believe – and there is good evidence to support me – that this guidance as written is actually slightly wrong:

  • The HT queue database is also an ESE database and is subject to the same limitations; this is pretty clear on a thorough read-through of the Exchange 2010 requirements in TechNet. Many people leave the HT queue database on the same volume they install Exchange to, which means that volume also cannot be presented via NFS. If you follow best practices, you move this queue database to a separate volume (which should be an RDM or guest-mounted iSCSI/FC LUN).
  • NetApp, one of the big storage vendors that supports the NFS-mounted VMware data store configuration, only supports Exchange databases mounted via FC/iSCSI LUNs using SnapManager for Exchange (SME) as shown in NetApp TR-3845. Additionally, in the join NetApp-VMware-Cisco performance whitepaper on virtualizing Microsoft workloads, the only configuration tested for Exchange 2010 is FC LUNs (TR-3785).
  • It is my understanding that the product group’s definition of Exchange files doesn’t just extend to ESE files and transaction logs, but to all of the Exchange binaries and associated files. I have not yet been able to find a published source to document this interpretation, but I am working on it.
  • I am not aware of any Microsoft-related restriction about iSCSI + DAG. This VMware Exchange 2010 best practices document (published in 2010) is the only source I’ve seen mention this restriction, and in fact, the latest VMware Microsoft clustering support matrix (published in June 2011) lists no such restriction. Microsoft’s guidelines seem to imply that block storage is block storage is block storage when it comes to “SCSI pass-through storage”). I have queries in to nail this one down because I’ve been asking in various communities for well over a year with no clear resolution other than, “That’s the way VMware is doing it.”

Okay, So Now What?

When I’m designing layouts for customers who are used to deploying Windows VMs via NFS-mounted VMDKs, I have a couple of options. My preferred option, if they’re also using RDMs, is to just have them provision one more RDM for the system drive and avoid NFS entirely for Exchange servers. That way, if my customer does have to call Microsoft support, we don’t have to worry about the issue at all.

However, that’s not always possible. My customer may have strict VM provisioning processes in place, have limited non-NFS storage to provision, or have some other reason why they need to use NFS-based VMDKs. In this case, I have found the following base layout to work well:

Volume Type Notes
C: VMDK or RDM Can be on any type of supported data store. Should be sized to include static page file of size PhysicalRAM + 10 MB.
E: RDM or guest iSCSI/FC iSCSI/FC    All Exchange binaries installed here. Move IIS files here (scripts out on Internet to do this for you). Create an E:\Exchdata directory and use NTFS mount points to mount each of the data volumes the guest will mount.
Data volumes RDM or guest iSCSI/FC Any volume holding mailbox/PF database EDB or logs, or HT queue EDB or logs. Should mount these separately, NTFS mount points recommended. Format these NTFS volumes with 64K block size, not default.

Note that we have several implicit best practices in use here:

  • Static page file, properly sized for a 64-bit operating system with a large amount of physical RAM. Doing this ensures that you have enough virtual memory for the Exchange memory profile AND that you can write a kernel memory crash dump to disk in the event of a blue screen. (If the page file is not sized properly, or is not on C:, the full dump cannot be written to disk.)
  • Exchange binaries not installed on the system drive. This makes restores much easier. Since Exchange uses IIS heavily, I recommend moving the IIS data files (the inetpub and children folders) off of the system drive and onto the Exchange volume. This helps reduce the rate of change on the system drive and offers other benefits such as making it easier to properly configure anti-virus exclusions.
  • The use of NTFS mount points (which mount the volume to a directory) instead of separate drive letters. For large DAGs, you can easily have a large number of volumes per MB role, making the use of drive letters a limitation on scalability. NTFS mount points work just like Unix mount points and work terribly well – they’ve been supported since Exchange 2003 and recommended since the late Exchange 2003 era for larger clusters. In Exchange 2007 and 2010 continuous replication environments (CCR, SCR, DAG), all copies must have the same pathnames.
  • Using NTFS 64K block allocations for any volumes that hold ESE databases. While not technically necessary for log partitions, doing so does not hurt performance.

So Why Is This Even A Problem?

This is the money question, isn’t it? Windows itself is supported under this configuration. Even SQL Server is. Why not Exchange?

At heart, it comes down to this: the Exchange ESE database engine is a very finely-tuned piece of software, honed for over 15 years. During that time, with only one exception (the Windows Storage Server 2003 Feature Pack 1, which allowed storage solutions running WSS 2003 + FP1 to host Exchange database files over NAS protocols), Exchange has never supported putting Exchange database files over file-level storage. I’m not enough of an expert on ESE to whip up a true detailed answer, but here is what I understand about it.

Unlike SQL Server, ESE is not a general purpose database engine. SQL is optimized to run relational databases of all types. The Exchange flavor of ESE is optimized for just one type of data: Exchange. As a result, ESE has far more intimate knowledge about the data than any SQL Server instance can. ESE provides a lot of performance boosts for I/O hungry Exchange databases and it can do so precisely because it can make certain assumptions. One of those assumptions is that it’s talking to block-level storage.

When a host process commits writes to storage, there’s a very real difference in the semantics of the write operation between block-level protocols and file-level protocols. Exchange, in particular, depends dramatically on precise control over block-level writes – which file protocols like NFS and SMB can mask. The cases under which this can cause data corruption for Exchange are admittedly corner cases, but they do exist and they can cause impressive damage.

Cleaning Up

What should we do about it if we have an Exchange deployment that is in violation of these support guidelines?

Ideally, we fix it. Microsoft’s support stance is very clear on this point, and in the unlikely event that data loss occurs in this configuration, Microsoft support is going to point at the virtualization/storage vendors and say, “Get them to fix it.” I am not personally aware of any cases of a configuration like this causing data loss or corruption, but I am not the Exchange Product Group – they get access to an amazing amount of data.

At the very least, you need to understand and document that you are in an unsupported configuration so that you can make appropriate plans to get into support as you roll out new servers or upgrade to future versions of Exchange. This is where getting a good Exchange consultant to do an Exchange health check can help you get what you need and provide the support you need with your management – we will document this in black and white and help provide the outside validation you might need to get things put right.

One request for the commenters: if all you’re going to do is say, “Well we run this way and have no problems,” don’t bother. I know and stipulate that there are many environments out there running in violating of this support boundary that have not (yet) run into issues. I’ve never said it won’t work. There are a lot of things we can do, but that doesn’t mean we should do them. At the same time, at the end of the day – if you know the issues and potential risks, you have to make the design decision that’s right for your organization. Just make sure it’s an informed (and documented, and signed-off!) decision.

Devin’s Load Balancer for Exchange 2010

Overview

One of the biggest differences I’m seeing when deploying Exchange 2010 compared to previous versions is that for just about all of my customers, load balancing is becoming a critical part of the process. In Exchange 2003 FE/BE, load balancing was a luxury unheard of for all but the largest organizations with the deepest pockets. Only a handful of outfits offered load balancing products, and they were expensive. For Exchange 2007 and the dedicated CAS role, it started becoming more common.

For Exchange 2003 and 2007, you could get all the same benefits of load balancing (as far as Exchange was concerned) by deploying an ISA server or ISA server cluster using Windows Network Load Balancing (WNLB). ISA included the concept of a “web farm” so it would round-robin incoming HTTP connections to your available FE servers (and Exchange 2007 CAS servers). Generally, your internal clients would directly talk to their mailbox servers, so this worked well. Hardware load balancers were typically used as a replacement for publishing with an ISA reverse proxy (and more rarely to load balance the ISA array instead of WNLB). Load balancers could perform SSL offloading, pre-authentication, and many of the same tasks people were formerly using ISA for. Some small shops deployed WNLB for Exchange 2003 FEs and Exchange 2007 CAS roles.

In Exchange 2010, everything changes. Outlook RPC connections now go to the CAS servers in the site, not the MB server that hosts the active copy of the database. Mailbox databases now have an affiliation with either a specific CAS server or a site-specific RPC client access array, which you can see using the –RpcClientAccessServer parameter of the Get-MailboxDatabase cmdlet. If you have two or more servers, I recommend you set up the RPC client access array as part of the initial deployment and get some sort of load balancer in place.

Load Balancing Options

At Trace3, we’re an F5 reseller, and F5 is one of the few load balancer companies out there that has really made an effort to understand and optimize Exchange 2010 deployments. However, I’m not on the sales side; I have customers using a variety of load balancing solutions for their Exchange deployments. At the end of the day, we want the customer to do what’s right for them. For some customers, that’s an F5. Others require a different solution. In those cases, we have to get creative – sometimes they don’t have budget, sometimes the networking team has their own plans, and on some rare occasions, the plans we made going in turned out not to be a good fit after all and now we have to come up with something on the fly.

If you’re not in a position to use a high-end hardware load balancer like an F5 BIG-IP or a Cisco ACE solution, and can’t look at some of the lower-cost (and correspondingly lower-feature) solutions that are now on the market, there are few alternatives:

  • WNLB. To be honest, I have attempted to use this in several environments now and even when I spent time going over the pros and cons, it failed to meet expectations. If you’re virtualizing Exchange (like many of my customers) and are trying to avoid single points of failure, WNLB is so clearly not the way to go. I no longer recommend this to my customers.
  • DNS round robin. This method at least has the advantage of in theory driving traffic to all of the CAS instances. However, in practice it gets in the way of quickly resolving problems when they come up. It’s better than nothing, but not by much.
  • DAG cluster IP. Some clever people came up with this option for instances where you are deploying multi-role servers with MB+HT+CAS on all servers and configuring them in a DAG. DAG = cluster, these smart people think, and clusters have a cluster IP address. Why can’t we just use that as the IP address of the RPC client access array? Sure enough, this works, but it’s not tested or supported by Microsoft and it isn’t a perfect solution. It’s not load balancing at all; the server holding the cluster IP address gets all the CAS traffic. Server sizing is important!

The fact of the matter is, there are no great alternatives if you’re not going to use hardware load balancing. You’re going to have to compromise something.

Introducing Devin’s Load Balancer

For many of my customers, we end up looking something like this:

  • The CAS/HT roles are co-located on one set of servers, while MB (and the DAG) is on another. This rules out the DAG cluster IP option.
  • They don’t want users to complain excessively when something goes wrong with one of the CAS/HT servers. This rules out DNS round robin.
  • They don’t have the budget for a hardware solution yet, or one is already in the works but not ready because of schedule. They need a temporary, low-impact solution. This effectively rules out WNLB.

I’ve come up with a quick and dirty fix I call Devin’s Load Balancer or, as I commonly call it, the DLB. It looks like this:

  1. Pick one CAS server that can handle all the traffic for the site. This is our target server.
  2. Pick an IP address for the RPC client access array for the site. Create the DNS A record for the RPC client access array FQDN, pointing to the IP address.
  3. Create the RPC client access array in EMS, setting the name, FQDN, and site.
  4. On the main network interface of the target server, add the IP address. If this IP address is on the same subnet as the main IP address, there is no need to create a secondary interface! Just add it as a secondary IP address/subnet mask.
  5. Make sure the appropriate mailbox databases are associated with the RPC client access array.
  6. Optionally, point the internal HTTP load balance array DNS A record to this IP address as well (or publish this IP address using ISA).

You may have noticed that this sends all traffic to the target server; it doesn’t really load balance. DLB also stands for Doesn’t Load Balance!

This configuration, despite its flaws, gives me what I believe are several important benefits:

  • It’s extremely easy to switchover/failover. If something happens to my target server, I simply add the RPC client access array IP address as a secondary IP address to my next CAS instance. There are no DNS cache entries to wait to expire. There are are no switch configurations to modify. There are no DNS records I have to update. If this is a planned switchover, client get disrupted but can immediately connect. I can make the update as soon as I get warning that something happened and my clients can reconnect without any further action on their part.
  • It isolates what I do with the other CAS instances. Windows and Exchange no longer have any clue they’re in a load balanced pseudo-configuration. With WNLB, if I make any changes to the LB cluster (like add or remove a member), all connections to the cluster IP addresses are dropped!
  • It makes it very easy to upgrade to a true load balancing solution. I set the true solution up in parallel with an alternate, temporary IP address. I use local HOSTS file entries on my test machines while I’m getting everything tested and validated. And then I simply take the RPC client access array IP address off the target server and put it on the load balancer. Existing connections are dropped, but new ones immediately connect with no timeouts – and now we’re really load balancing.

Note that you do not need the CAS SSL certificate to contain the FQDN of the RPC client access array as a SAN entry. RPC doesn’t use SSL for encryption (it’s not based on HTTP).

Even in a deployment where the customer is putting all roles into single-server configuration, if there’s any thought at all that they might want to expand to an HA configuration in the future, I now am in the habit of configuring this. The RPC client access array is now configured and somewhat isolated from the CAS configuration, so now my future upgrades are easier and less disruptive.

Moving to Exchange Server 2010 Service Pack 1

Microsoft recently announced that Service Pack 1 (SP1) for Exchange Server 2010 had been released to web, prompting an immediate upgrade rush for all of us Exchange professionals. Most of us maintain at least one home/personal lab environment, the better to pre-break things before setting foot on a customer site. Before you go charging out to do this for production (especially if you’re one of my customers, or don’t want to run the risk of suddenly becoming one of my customers), take a few minutes to learn about some of the current issues with SP1.

Easy Installation and Upgrade Slipstreaming

One thing that I love about Exchange service packs is that from Exchange 2007 on, they’re full installations in their own right. Ready to deploy a brand new Exchange 2010 SP1 server? Just run setup from the SP1 binaries – no more fiddling around with the original binaries, then applying your service packs. Of course, the Update Rollups now take the place of that, but there’s a mechanism to slipstream them into the installer (and here is the Exchange 2007 version of this article).

Note: If you do make use of the slipstream capabilities, remember that Update Rollups are both version-dependent (tied to the particular RTM/SP release level) and are cumulative. SP1 UR4 is not the same thing as RTM UR4! However, RTM UR4 will include RTM UR3, RTM UR2, and RTM UR1…just as SP1 UR4 will contain SP1 UR3, SP1 UR2, and SP1 UR1.

The articles I linked to say not to slipstream the Update Rollups with a service pack, and I’ve heard some confusion about what this means. It’s simple: you can use the Updates folder mechanism to slipstream the Update Rollups when you are performing a clean install. You cannot use the slipstream mechanism when you are applying a service pack to an existing Exchange installation. In the latter situation, apply the service pack, then the latest Update Rollup.

It’s too early for any Update Rollups for Exchange 2010 SP1 to exist at the time of writing, but if there were (for the sake of illustration, let’s say that SP1 UR X just came out), consider these two scenarios:

  • You have an existing Exchange 2010 RTM UR4 environment and want to upgrade directly to SP1 UR1. You would do this in two steps on each machine: run the SP1 installer, then run the latest SP1 UR X installer.
  • You now want to add a new Exchange 2010 server into your environment and want it to be at the same patch level. You could perform the installation in a single step from the SP1 binaries by making sure the latest SP1 UR X installer was in the Updates folder.

If these scenarios seem overly complicated, just remember back to the Exchange 2003 days…and before.

Third Party Applications

This might surprise you, but in all of the current Exchange 2010 projects I’m working on, I’ve not even raised the question of upgrading to SP1 yet. Why would I not do that? Simple – all of these environments have dependencies on third-party software that is not yet certified for Exchange 2010 SP1. In some cases, the software has barely just been certified for Exchange 2010 RTM! If the customer brings it up, I always encourage them to start examining SP1 in the lab, but for most production environments, supportability is a key requirement.

Make sure you’re not going to break any applications you care about before you go applying service packs! Exchange service packs always make changes – some easy to see, some harder to spot. You may need to upgrade your third-party applications, or you may simply need to make configuration changes ahead of time – but if you blindly apply service packs, you’ll find these things out the hard way. If you have a critical issue or lack of functionality that the Exchange 2010 SP1 will address, get it tested in your lab and make sure things will work.

Key applications I encourage my customers to test include:

Applications that use SMTP submission are typically pretty safe, and there are other applications that you might be okay living without if something does break. Figure out what you can live with, test them (or wait for certifications), and go from there.

Complications and Gotchas

Unfortunately, not every service pack goes smoothly. For Exchange 2010 SP1, one of the big gotchas that early adopters are giving strong feedback about is the number of hotfixes you must download and apply to Windows and the .NET Framework before applying SP1 (a variable number, depending on which base OS your Exchange 2010 server is running).

Having to install hotfixes wouldn’t be that bad if the installer told you, “Hey, click here and here and here to download and install the missing hotfixes.” Exchange has historically not done that (citing boundaries between Microsoft product groups) even though other Microsoft applications don’t seem to be quite as hobbled. However, this instance of (lack of) integration is particularly egregious because of two factors.

Factor #1: hotfix naming conventions. Back in the days of Windows 2000, you knew whether a hotfix was meant for your system, because whether you were running Workstation or Server, it was Windows 2000. Windows XP and Windows 2003 broke that naming link between desktop and server operating systems, often confusingly so once 64-bit versions of each were introduced (32-bit XP and 32-bit 2003 had their own patch versions, but 64-bit XP applied 64-bit 2003 hotfixes).

Then we got a few more twists to deal with. For example, did you know that Windows Vista and Windows Server 2008 are the same codebase under the hood? Or that Windows 7 and Windows Server 2008 R2, likewise, are BFFs? It’s true. Likewise, the logic behind the naming of Windows Server 2003 R2 and Windows Server 2008 R2 were very different; Windows Server 2003 R2 was basically Windows Server 2003 with a SP and few additional components, while Windows Server 2008 R2 has some substantially different code under the hood than Windows Server 2008 with SP. (I would guess that Windows Server 2008 R2 got the R2 moniker to capitalize on Windows 2008’s success, while Windows 7 got a new name to differentiate itself from the perceived train wreck that Vista had become, but that’s speculation on my part.)

At any rate, figuring out which hotfixes you need – and which versions of those hotfixes – is less than easy. Just remember that you’re always downloading the 64-bit patch, and that Windows 2008=Vista while Windows 2008 R2=Windows 7 and you should be fine.

Factor #2: hotfix release channels. None of these hotfixes show up under Windows Update. There’s no easy installer or tool to run that gets them for you. In fact, at least two of the hotfixes must be obtained directly from Microsoft Customer Support Services. All of these hotfixes include scary legal boilerplate about not being fully regression tested and thereby not supported unless you were directly told to install them by CSS. This has caused quite a bit of angst out in the Exchange community, enough so that various people are collecting the various hotfixes and making them available off their own websites in one easy package to download[1].

I know that these people mean well and are trying to save others from a frustrating experience, but in this case, the help offered is a bad idea. That same hotfix boilerplate means that everyone who downloads those hotfixes agree not to redistribute those hotfixes. There’s no exception for good intentions. If you think this is bogus, let me give you two things to think about:

  • You need to be able to verify that your hotfixes are legitimate and haven’t been tampered with. Do you really want to trust production mission-critical systems to hotfixes you scrounged from some random Exchange pro you only know through blog postings? Even if the pro is trustworthy, is their web site? Quite frankly, I trust Microsoft’s web security team to prevent, detect, and mitigate hotfix-affecting intrusions far more quickly and efficiently than some random Exchange professional’s web host. I’m not disparaging any of my colleagues out there, but let’s face it – we have a lot more things to stay focused on. Few of us (if any) have the time and resources the Microsoft security guys do.
  • Hotfixes in bundles grow stale. When you link to a KB article or Microsoft Download offering to get a hotfix, you’re getting the most recent version of that hotfix. Yes, hotfixes may be updated behind the scenes as issues are uncovered and testing results come in. In the case of the direct-from-CSS hotfixes, you can get them for free through a relatively simple process. As part of that process, Microsoft collects your contact info so they can alert you if any issues later come up with the hotfix that may affect you. Downloading a stale hotfix from a random bundle increases the chances of getting an old hotfix version that may cause issues in your environment, costing you a support incident. How many of these people are going to update their bundles as new hotfix versions become available? How quickly will they do it – and how will you know?

The Exchange product team has gotten an overwhelming amount of feedback on this issue, and they’ve responded on their blog. Not only do they give you a handy table rounding up links to get the hotfixes, they also collect a number of other potential gotchas and advice to learn from from before beginning your SP1 deployment. Go check it out, then start deploying SP1 in your lab.

Good luck, and have fun! SP1 includes some killer new functionality, so take a look and enjoy!

[1] If you’re about to deploy a number of servers in a short period of time, of course you should cache these downloaded hotfixes for your team’s own use. Just make sure that that you check back occasionally for updated versions of the hotfixes. The rule of thumb I’d use is about a week – if I’m hitting my own hotfix cache and it’s older than a week, it’s worth a couple of minutes to make sure it’s still current.

Manually creating a DAG FSW for Exchange 2010

I just had a comment from Chris on my Busting the Exchange Trusted Subsystem Myth post that boiled down to asking what you do when you have to create the FSW for an Exchange 2010 DAG manually?

In order for this to be true, you have to have the following conditions:

  1. You have no other Exchange 2010 servers in the AD site. This implies that at least one of your DAG nodes is multi-role — remember that you need to have a CAS role and an HT role in the same site as your MB roles, preferably two or more of each for redundancy and load. If you have another Exchange 2010 server, then it’s already got the correct permissions — let Exchange manage the FSW automatically.
  2. If the site in question is part of a DAG that stretches sites, there are more DAG nodes in this site than in the second site. If you’re trying to place the FSW in the site with fewer members, you’re asking for trouble[1].
  3. You have no other Windows 2003 or 2008 servers in the site that you consider suitable for Exchange’s automatic FSW provisioning[2]. By this, I mean you’re not willing to the the Exchange Trusted Subsystem security group to the server’s local Administrators group so that Exchange can create, manage, and repair the FSW on its own. If your only other server in the site is a DC, I can understand not wanting to add the group to the Domain Admins group.

If that’s the case, and you’re dead set on doing it this way, you will have to manually create the FSW yourself. A FSW consists of two pieces: the directory, and the file share. The process for doing this is not documented anywhere on TechNet that I could find with a quick search, but happily, one Rune Bakkens blogs the following process:

To pre-create the FSW share you need the following:
– Create a folder etc. D:\FilesWitness\DAGNAME
– Give the owner permission to Exchange Trusted Subsystem
– Give the Exchange Trusted Subsystem Full Control (NTFS)
– Share the folder with the following DAGNAME.FQDN (If you try a different share name,
it won’t work. This is somehow required)
– Give the DAGNAME$ computeraccount Full Control (Share)

When you’ve done this, you can run the set-databaseavailabilitygroup -witnessserver CLUSTERSERVER – witnessdirectory D:\Filewitness\DAGNAME

You’ll get the following warning message:

WARNING: Specified witness server Cluster.fqdn is not an Exchange server, or part of the Exchange Servers security group.
WARNING: Insufficient permission to access file shares on witness server Cluster.fqdn. Until this problem is corrected, the database availability group may be more vulnerable to failures. You can use the set-databaseavailabilitygroup cmdlet to try the operation again. Error: Access is denied

This is expected, since the cmdlet tries to create the folder and share, but don’t have the permissions to do this.

When this is done, the FSW should be configured correct. To verify this, the following files should be created:

- VerifyShareWriteAccess
– Witness

Just for the record, I have not tested this process yet. However, I’ve had to do some recent FSW troubleshooting lately and this matches with what I’ve seen for naming conventions and permissions, so I’m fairly confident this should get you most of the way there. Thank you, Rune!

Don’t worry, I haven’t forgotten the next installment of my Exchange 2010 storage series. It’s coming, honest!

[1] Consider the following two-site DAG scenarios:

  • If there’s an odd number of MB nodes, Exchange won’t use the FSW.
  • An even number (n) of nodes in each site. The FSW is necessary for there to even be a quorum (you have 2n+1 nodes so a simple majority is n+1). If you lose the FSW and one other node — no matter where that node is — you’ll lose quorum. If you lose the link between sites, you lose quorum no matter where the FSW is.
  • A number (n) nodes in site A, with at least one fewer nodes (m) in site B. If n+m is odd, you have an odd number of nodes — our first case. Even if m is only 1 fewer than n, putting the FSW in site B is meaningless — if you lose site A, B will never have quorum (in this case, m+1 = n, and n is only half — one less than quorum).

I am confident in this case that if I’ve stuffed up the math here, someone will come along to correct me. I’m pretty sure I’m right, though, and now I’ll have to write up another post to show why. Yay for you!

[2] You do have at least one other Windows server in that site, though, right — like your DC? Exchange doesn’t like not having a DC in the local site — and that DC should also be a GC.

The Disk’s The Thing! Exchange 2010 Storage Essays, part 2

Greetings, readers! When I first posted From Whence Redundancy? (part 1 of this series of essays on Exchange 2010 storage) I’d intended to follow up with other posts a bit faster than I have been. So much for intentions; let us carry on.

In part 1, I began the process of talking about how I think the new Exchange 2010 storage options will play out in live Exchange deployments over the next several years. The first essay in this series discussed what is I believe the fundamental question at the heart an Exchange 2010 storage design: at what level will you ensure the redundancy of your Exchange mailbox databases? The traditional approach has used RAID at the disk level, but Exchange 2010 DAGs allow you to deploy mailbox databases in JBOD configurations. While I firmly believe that’s the central question, answering it requires us to dig under the hood of storage.

With Exchange 2010, Microsoft specifically designed Exchange mailbox servers to be capable of using the lowest common denominator of server storage: a directly attached storage (DAS) array of 7200 RPM SATA disks in a Just a Box of Disks (JBOD) configuration (what I call DJS). Understanding why they’ve made this shift requires us to understand more about the disk drive technology. In this essay, part 2 of this series, let’s talk about disk technology and find out how Fibre Channel (FC), Serially Attached SCSI (SAS), and Serial Advanced Technology Attachment (SATA) disk drives are the same – and more importantly, what slight differences they have and what that means for your Exchange systems.

Exchange Storage SATA vs SAS

So here’s the first dirty little secret: for the most part, all disks are the same. Regardless of what type of bus they use, what form factor they are, what capacity they are, and what speed they rotate at, all modern disks use the same construction and principles:

  • They all have one or more thin rotating platters coated with magnetic media; the exact number varies by form factor and capacity. Platters look like mini CD-ROM disks, but unlike CDs, platters are typically double-sided. Platters have a rotational speed measured in revolutions per minute (RPMs).
  • Each side of a platter has an associated read-write head. These heads are on a single-track arm that moves in toward the hub of the platter or out towards the rim. The heads do not touch the platter, but float very close to the surface. It takes a measurable fraction of a second for the head to relocate from one position to another; this is called its seek time.
  • The circle described by the head’s position on the platter is called a track. In a multi-platter disk, the heads move in synchronization (there’s no independent tracking per platter or side). As a result, each head is on the same track at the same time, describing a cylinder.
  • Each drive unit has embedded electronics that implement the bus protocol, control the rotational speed of the platters, and translate I/O requests into the appropriate commands to the heads. Even though there are different flavors, they all perform the same basic functions.

If you would like a more in-depth primer on how disks work, I recommend starting with this article. I’ll wait for you.

Good? Great! So that’s how all drives are the same. It’s time to dig into the differences. They’re relatively small, but small differences have a way of piling up. Take a look at Table 1 which summarizes the differences between various FC, SATA, and SAS disks, compared with legacy PATA 133 (commonly but mistakenly referred to as IDE) and SCSI Ultra 320 disks:

Table 1: Disk parameter differences by disk bus type

Type Max wire bandwidth(Mbit/s) Max data transfer(MB/s)
PATA 133 1,064 133.5
SCSI Ultra 320 2,560 320
SATA-I 1,500 150
SATA-II 3,000 300
SATA 6 Gb/s 6,000 600
SAS 150 1,500 150
SAS 300 3,000 300
FC (copper) 4,000 400
FC (optic) 10,520 2,000

 

As of this writing, the most common drive types you’ll see for servers are SATA-II, SAS 300, and FC over copper. Note that while SCSI Ultra 320 drives in theory have a maximum data transfer higher than either SATA-II or SAS 300, in reality that bandwidth is shared among all the devices connected to the SCSI bus; both SATA and SAS have a one-to-one connection between disk and controller, removing contention. Also remember that SATA is only a half-duplex protocol, while SAS is a full-duplex protocol. SAS and FC disks use the full SCSI command set to allow better performance when multiple I/O requests are queued for the drive, whereas SATA uses the ATA command set. Both SAS and SATA implement tagged queuing, although they use two different standards (each of which has its pros and cons).

The second big difference is the average access time of the drive, which is the sum of multiple factors:

  • The average seek time of the heads. The actuator motors that move the heads from track to track are largely the same from drive to drive and thus the time contributed to the drive’s average seek time by just the head movements is roughly the same from drive to drive. What varies is the length of the head move; is it moving to a neighboring track, or is it moving across the entire surface? We can average out small track changes with large track changes to come up with idealized numbers.
  • The average latency of the platter. How fast the platters are spinning determines how quickly a given sector containing the data to be read (or where new data will be written) will move into position under the head once it’s in the proper track. This is a simple calculation based on the RPM of the platter and the observed average drive latency. We can assume that a given sector will move into position, on average, in no more than half a rotation. This gives us 30 seconds out of each minute of rotation, or 30,000 ms, into which we can divide the drive’s actual rotation.
  • The overhead caused by the various electronics and queuing mechanisms of the drive electronics, including any power saving measures such as reducing the spin rate of the drive platters. Although electricity is pretty fast and on-board electronics are relatively small circuits, there may be other factors (depending on the drive type) that may introduce delays into the process of fulfilling the I/O request received from the host server.

What has the biggest impact is how fast the platter is spinning, as shown in Table 2:

Table 2: Average latency caused by rotation speed

Platter RPM Average latency in ms
7,200 4.17
10,000 3
12,000 2.5
15,000 2

 

(As an exercise, do the same math on the disk speeds for the average laptop drives. This helps explain why laptop drives are so much slower than even low-end 7,200 RPM SATA desktop drives.)

Rather than painfully take you through the result of all of these tables and calculations step by step, I’m simply going to refer you to work that’s already been done. Once we know the various averages and performance metrics, we can figure out how many I/O operations per second (IOPS) a given drive can sustain on average, according to the type, RPMs, and nature of the I/O (sequential or random). Since Microsoft has already done that work for us as part of the Exchange 2010 Mailbox Role Calculator (version 6.3 as of this writing, I’m going to simply use the values there. Let’s take a look at how all this plays out in Table 3 by selecting some representative values.

Table 3: Drive IOPS by type and RPM

Size Type RPM Average Random IOPS
3.5” SATA 5,400 50
2.5” SATA 5,400 55
3.5” SAS 5,400 52.5
3.5” SAS 5,900 52.5
3.5” SATA 7,200 55
2.5” SATA 7,200 60
3.5” SAS 7,200 57.5
2.5” SAS 7,200 62.5
3.5” FC/SCSI/SAS 10,000 130
2.5” SAS 10,000 165
3.5” FC/SCSI/SAS 15,000 180
2.5” SAS 15,000 230

 

There are three things to note about Table 3.

  1. These numbers come from Microsoft’s Exchange 2010 Mailbox Sizing Calculator and are validated across vendors through extensive testing in an Exchange environment. While there may be minor variances between drive model and manufacturers and these number may seem pessimistic according to calculated IOPS number published for individual drives, these are good figures to use in the real world. Using calculated IOPS numbers can lead both to a range of figures, depending on the specific drive model and manufacturer, as well as to overestimating the amount of IOPS the drive will actually provide to Exchange.
  2. For the most part, SAS and FC are indistinguishable from the IOPs point of view. Regardless of the difference between the electrical interfaces, the drive mechanisms and I/O behaviors are comparable.
  3. Sequential IOPS are not listed; they will be quite a bit higher than the random IOPS (that same 7,200RPM SATA drive can provide 300+ IOPS for sequential operations). The reason is simple; although a lot of Exchange 2010 I/O has been converted from random to sequential, there’s still some random I/O going on. That’s going to be the limiting factor.

The IOPS listed are per-drive IOPS. When you’re measuring your drive system, remember that the various RAID configurations have their own IOPS overhead factor that will consume a certain number

There are of course some other factors that we need to consider, such as form factor and storage capacity. We can address these according to some generalizations:

  • Since SAS and FC tend to have the same performance characteristics, the storage enclosure tends to differentiate between which technology is used. SAS enclosures can often be used for SATA drives as well, giving more flexibility to the operator. SAN vendors are increasingly offering SAS/SATA disk shelves for their systems because paying the FC toll can be a deal-breaker for new storage systems.
  • SATA disks tend to have a larger storage capacity than SAS or FC disks. There are reasons for this, but the easiest one to understand is that SAS, being traditionally a consumer technology, has a lower duty cycle and therefore lower quality control specifications that must be met.
  • SATA disks tend to be offered with lower RPMs than SAS and FC disks. Again, we can acknowledge that quality control plays a part here – the faster a platter spins, the more stringently the drive components need to meet their specifications for a longer period of time.
  • 2.5” drives tend to have lower capacity than their 3.5” counterparts. This makes sense – they have smaller platters (and may have fewer platters in the drive).
  • 2.5” drives tend to use less power and generate less heat than equivalent 3.5” drives. This too makes sense – the smaller platters have less mass, requiring less energy to sustain rotation.
  • 2.5” drives tend to permit a higher drive density in a given storage chassis while using only fractionally more power. Again, this makes sense based on the previous two points; I can physically fit more drives into a given space, sometimes dramatically so.

Let’s look at an example. A Supermicro SC826 chassis holds 12 3.5” drives with a minimum of 800W power while the equivalent Supermicro SC216 chassis holds 24 2.5” drives with a minimum of 900W of power in the same 2Us of rack space. Doubling the number of drives makes up for the capacity difference between the 2.5” and 3.5” drives, provides twice as many spindles and allows a greater aggregate IOPS for the array, and only requires 12.5% more power.

The careful reader has noted that I’ve had very little to say about capacity in this essay, other than the observation above that SATA drives tend to have larger capacities, and that 3.5” drives tend to be larger than 2.5” drives. From what I’ve seen in the field, the majority of shops are just now looking at 2.5” drive shelves, so it’s safe to assume 3.5” is the norm. As a result, the 3.5” 7,200 RPM SATA drive represents the lowest common denominator for server storage, and that’s why the Exchange product team chose that drive as the performance bar for DJS configurations.

Exchange has been limited by performance (IOPS) requirements for most of its lifetime; by going after DJS, the product team has been able to take advantage of the fact that the capacity of these drives is the first to grow. This is why I think that Microsoft is betting that you’re going to want to simplify your deployment, aim for big, cheap, slow disks, and let Exchange DAGs do the work of replicating your data.

Now that we’ve talked about RAID vs. JBOD and SATA vs. SAS/FC, we’ll need to examine the final topic: SAN vs. DAS. Look for that discussion in Part 3, which will be forthcoming.

What Exchange 2010 on Windows Datacenter Means

Exchange Server has historically come in two flavors for many versions – Standard Edition and Enterprise Edition. The main difference this license change made for you was the maximum number of supported mailbox databases as shown in Table 1:

Version Standard Edition Enterprise Edition
Exchange 2003 1 (75GB max) 20
Exchange 2007 5 50
Exchange 2010 5 100

Table 1: Maximum databases per Exchange editions

However, the Exchange Server edition is not directly tied to the Windows Server edition:

  • For Exchange 2003 failover cluster mailbox servers, Exchange 2007 SCC/CCR environments [1], and  Exchange 2010 DAG environments, you need Windows Server Enterprise Edition in order to get the MSCS cluster component framework.
  • For Exchange 2003 servers running purely as bridgeheads or front-end servers, or Exchange 2007/2010 HT, CAS, ET, and UM servers, you only need Windows Server Standard Edition.

I’ve seen some discussion around the fact that Exchange 2010 will install on Windows Server 2008 Datacenter Edition and Windows Server 2008 R2 Datacenter Edition, even though it’s not supported there and is not listed in the Operating System requirements section of the TechNet documentation.

HOWEVER…if we look at the Prerequisites for Exchange 2010 Server section of the Exchange Server 2010 Licensing site, we now see that Datacenter edition is, in fact listed as shown in Figure 1:

Exchange 2010 server license comparison

Figure 1: Exchange 2010 server license comparison

This is pretty cool, and the appropriate TechNet documentation is in the process of being updated to reflect this. What this means is that you can deploy Exchange 2010 on Windows Server Datacenter Edition; the differences between editions of Windows Server 2008 R2 are found here.[2] If you take a quick scan through the various feature comparison charts in the sidebar, you might wonder why anyone would want to install Exchange 2010 on Windows Server Datacenter Edition; it’s more costly and seems to provide the same benefits. However, take a look at the technical specifications comparison; this is, I believe, the meat of the matter:

  • Both editions give you a maximum of 2 TB – more than you can realistically throw at Exchange 2010.
  • Enterprise Edition gives you support for a maximum eight (8) x64 CPU sockets, while Datacenter Edition gives you sixty-four (64). With quad-core CPUs, this means a total of 32 cores under Enterprise vs. 256 cores under Datacenter.
  • With the appropriate hardware, you can hot-add memory in Enterprise Edition. However, you can’t perform a hot-replace, nor can you hot-add or hot-replace CPUs under Enterprise. With Datacenter, you can hot-add and hot-remove both memory and CPUs.

These seem to be compelling in many scenarios at first glance, unless you’re familiar with the recommended maximum configurations for Exchange 2010 server sizing. IIRC, the maximum CPUs that are recommended for most Exchange 2010 server configurations (including multirole servers) would be 24 cores – which fits into the 8 socket limitation of Enterprise Edition while using quad core CPUs.

With both Intel and AMD now offering hexa-core (6 core) CPUs, you can move up to 48 cores in Enterprise Edition. This is more than enough for any practical deployment of Exchange Server 2010 I can think of at this time, unless future service packs drastically change the CPU performance factors. Both Enterprise and Datacenter give you a ceiling of 2TB of RAM, which is far greater than required by even the most aggressively gigantic mailbox load I’d want to place on a single server. I’m having a difficult time seeing how anyone could realistically build out an Exchange 2010 server that goes beyond the performance and scalability limits of Enterprise Edition in any meaningful way.

In fact, I can think of only three reasons someone would want to run Exchange 2010 on Windows Server Datacenter Edition:

  • You have spare Datacenter Edition licenses, aren’t going to use them, and don’t want to buy more Enterprise Edition licenses. This must be a tough place to be in, but it can happen under certain scenarios.
  • You have a very high server availability requirements and require the hot-add/hot-replace capabilities. This will get costly – the server hardware that supports this isn’t cheap – but if you need it, you need it.
  • You’re already running a big beefy box with Datacenter and virtualization[3]. The box has spare capacity, so you want to make use of it.

The first two make sense. The last one, though, I’d be somewhat leery of doing. Seriously, think about this – I’m spending money on monstrous hardware with awesome fault tolerance capabilities, I’ve forked over for an OS license[4] that gives me the right to unlimited virtual machines, and now I’m going to clutter up my disaster recovery operations by mixing Exchange and other applications (including virtualization) in the same host OS instance? That may be great for a lab environment, but I’d have a long conversation with any customer who wanted to do this under production. Seriously, just spin up a new VM, use Windows Server Enterprise Edition, and go to town. The loss of hardware configuration flexibility I get from going virtual is less than I gain by compartmentalizing my Exchange server to its own machine, along with the ability to move that virtual machine to any virtualization host I have.

So, there you have it: Exchange 2010 can now be run on Windows Server Datacenter Edition, which means yay! for options. But in the end, I don’t expect this to make a difference for any of the deployments I’m like to be working on. This is a great move for a small handful of customers who really need this.

[1] MSCS is not required for Exchange 2007 SCR, although manual target activation can be easier in some scenarios if your target is configured as a single passive node cluster.

[2] From what I can tell, the same specs seem to be valid for Windows Server 2008, with the caveat that Windows Server 2008 R2 doesn’t offer a 32-bit version so the chart doesn’t give that information. However, since Exchange 2010 is x64 only, this is a moot point.

[3] This is often an attractive option, since you can hosted an unlimited number of Windows Server virtual machines without having to buy further Windows Server licenses for them.

[4] Remember that Datacenter is not licensed at a flat cost per server like Enterprise is; it’s licensed per socket. The beefier the machine you run it on, the more you pay.

Poor Google? Not.

Since yesterday, the Net has been abuzz because of Google’s blog posting about their discovery they were being hacked by China. Almost every response I’ve seen has focused on the attempted hacking of the mailboxes of Chinese human rights activists.

That’s exactly where Google wants you to focus.

Let’s take a closer look at their blog post.

Paragraph 1:

In mid-December, we detected a highly sophisticated and targeted attack on our corporate infrastructure originating from China that resulted in the theft of intellectual property from Google.

Paragraph 2:

As part of our investigation we have discovered that at least twenty other large companies from a wide range of businesses–including the Internet, finance, technology, media and chemical sectors–have been similarly targeted.

Whoa. That’s some heavy-league stuff right there. Coordinated, targeted commercial espionage across a variety of vertical industries. Google first accuses China of stealing its intellectual property, then says that they weren’t the only ones. Mind you, industry experts – including the United States governmenthave been saying the same thing for years. Cries of ‘China hacked us!” happen relatively frequently in the IT security industry, enough so that it blends into the background noise after awhile.

My question is why, exactly, Google thought this wouldn’t happen to them? They’re a big fat juicy target on many levels. Gmail with thousands upon thousands of juicy mailboxes? Check! Search engine code and data that allows sophisticated monitoring and manipulation of Internet queries? Check! Cloud-based office documents that just might contain some competitive value? Check!

My second question is, why, exactly, is Google trying to shift the focus of the story from the IP theft (which by their own press report was successful) and cloak their actions in the “oh, noes, China tried to grab dissidents’ email” moral veil they’re using?

Paragraph 3:

Second, we have evidence to suggest that a primary goal of the attackers was accessing the Gmail accounts of Chinese human rights activists. Based on our investigation to date we believe their attack did not achieve that objective. Only two Gmail accounts appear to have been accessed, and that activity was limited to account information (such as the date the account was created) and subject line, rather than the content of emails themselves.

Two accounts, people, and the attempt wasn’t even fully successful. And the moral outrage shimmering from the screen in Paragraph 4, when Google says that “dozens” of accounts were accessed by third parties not through any sort of security flaw in Google, but rather through what is probably malware, is enough to knock you over.

Really, Google? You’re just now tumbling to the fact that people’s GMail accounts are getting hacked through malware?

I don’t buy the moral outrage. I think the meat of the matter is back in paragraph 1. I believe that the rest of the outrage is a smokescreen to repaint Google into the moral high ground for their actions, when from the sidelines here it certainly looks like Google chose knowingly to play with fire and is now suddenly outraged that they, too, got burned.

Google, you have enough people willing to play along with your attempt to be the victim. I’m not one of them. You compromised human rights principles in 2006 and knowingly put your users into harm’s way. “Do no evil,” my ass.

From Whence Redundancy? Exchange 2010 Storage Essays, part 1

Updated 4/13 with improved reseed time data provided by item #4 in the Top 10 Exchange Storage Myths blog post from the Exchange team.

Over the next couple of months, I’d like to slowly sketch out some of the thoughts and impressions that I’ve been gathering about Exchange 2010 storage over the last year or so and combine them with the specific insights that I’m gaining at my new job. In this inaugural post, I want to tackle what I have come to view as the fundamental question that will drive the heart of your Exchange 2010 storage strategy: will you use a RAID configuration or will you use a JBOD configuration?

In the interests of full disclosure, the company I work for now is a strong NetApp reseller, so of course my work environment is conducive to designing Exchange in ways that make it easy to sell the strengths of NetApp kit. However, part of the reason I picked this job is precisely because I agree with how they address Exchange storage and how I think the Exchange storage paradigm is going to shake out in the next 3-5 years as more people start deploying Exchange 2010.

In Exchange 2010, Microsoft re-designed the Exchange storage system to target what we can now consider to be the lowest common denominator of server storage: a directly attached storage (DAS) array of 7200 RPM SATA disks in a Just a Box of Disks (JBOD) configuration. This DAS/JBOD/SATA (what I will now call DJS) configuration has been an unworkable configuration for Exchange for almost its entire lifetime:

  • The DAS piece certainly worked for the initial versions of Exchange; that’s what almost all storage was back then. Big centralized SANs weren’t part of the commodity IT server world, reserved instead for the mainframe world. Server administrators managed server storage. The question was what kind of bus you used to attach the array to the server. However, as Exchange moved to clustering, it required some sort of shared storage. While a shared SCSI bus was possible, it not only felt like a hack, but also didn’t scale well beyond two nodes.
  • SATA, of course, wasn’t around back in 1996; you had either IDE or SCSI. SCSI was the serious server administrator’s choice, providing better I/O performance for server applications, as well as faster bus speeds. SATA, and its big brother SAS, both are derived from the lessons that years of SCSI deployments have provided. Even for Exchange 2007, though, SATA’s poor random I/O performance made it unsuitable for Exchange storage. You had to use either SAS or FC drives.
  • RAID has been a requirement for Exchange deployments, historically, for two reasons: to combine enough drive spindles together for acceptable I/O performance (back when disks were smaller than mailbox databases), and to ensure basic data redundancy. Redundancy was especially important once Exchange began supporting shared storage clustering and required both aggregate I/O performance only achievable with expensive disks and interfaces as well as the reduced chance of a storage failure being a single point of failure.

If you look at the marketing material for Exchange 2010, you would certainly be forgiven for thinking that DJS is the only smart way to deploy Exchange 2010, with SAN, RAID, and non-SATA systems supported only for those companies caught in the mire of legacy deployments. However, this isn’t at all true. There are a growing number of Exchange experts (and not just those of us who either work for storage vendors or resell their products) who think that while DJS is certainly an interesting option, it’s not one that’s a good match for every customer.

In order to understand why DJS is truly possible in Exchange 2010, and more importantly begin to understand where DJS configurations are a good fit and what underlying conditions and assumptions you need to meet in order to get the most value from DJS, we need to separate these three dimensions and discuss them separately.

JBOD vs RAID

While I will go into more detail on all three dimensions at later date, I want to focus on the JBOD vs.. RAID question now. If you need some summaries, then check out fellow Exchange MVP (and NetApp consultant) John Fullbright’s post on the economics of DAS vs. SAN as well as Microsoft’s Matt Gossage and his TechEd 2009 session on Exchange 2010 storage. Although there are good arguments for diving into drive technology or storage connection debates, I’ve come to believe that the central philosophy question you must answer in your Exchange 2010 design is at what level you will keep your data redundant. Until Exchange 2007, you had only one option: keeping your data redundant at the disk controller level. Using RAID technologies, you had two copies of your data[1]. Because you had a second copy of the data, shared storage clustering solutions could be used to provide availability for the mailbox service.

With Exchange 2007’s continuous replication features, you could add in data redundancy at the application level and avoid the dependency of shared storage; CCR creates two copies, and SCR can be used to create one or more additional copies off-site. However, given the realities of Exchange storage, for all but the smallest deployments, you had to use RAID to provide the required number of disk spindles for performance. With CCR, this really meant you were creating four copies; with SCR, you were creating an additional two copies for each target replica you created.

This is where Exchange 2010 throws a wrench into the works. By virtue of a re-architected storage engine, it’s possible under specific circumstances to design a mailbox database that will fit on a single drive while still providing acceptable performance. The reworked continuous replication options, now simplified into the DAG functionality, create additional copies on the application level. If you hit that sweet spot of the 1:1 database to disk ratio, then you only have a single copy of the data per replica and can get an n-1 level of redundancy, where n is the number of replicas you have. This is clearly far more efficient for disk usage…or is it? The full answer is complex, the simple answer is, “In some cases.”

In order to get the 1:1 database to disk ratio, you have to follow several guidelines:

  1. Have at least three replicas of the database in the DAG, regardless of which sites they are in. Doing so allows you to place both the EDB and transaction log files on the same physical drive, rather than separating them as you did in previous versions of Exchange.
  2. Ensure that you have at least two replicas per site. The reason for this is that unlike Exchange 2007, you can reseed a failed replica from another passive copy. This allows you to avoid reseeding over your WAN, which is something you do not want to do.
  3. Size your mailbox databases to include no more users than will fit in the drive’s performance envelope. Although Exchange 2010 converts many of the random I/O patterns to sequential, giving better performance, not all has been converted, so you still have to plan against the random I/O specs.
  4. Ensure that write transactions can get written successfully to disk. Use a battery-backed caching controller for your storage array to ensure the best possible performance from the disks. Use write caching for the physical disks, which means ensuring each server hosting a replica has a UPS.

At this point, you probably have disk capacity to spare, which is why Exchange 2010 allows the creation of archive mailboxes in the same mailbox database. All of the user’s data is kept at the same level of redundancy, and the archived data – which is less frequently accessed than the mainline data – is stored without additional significant disk or I/O penalty. This all seems to indicate that JBOD is the way to go, yes? Two copies in the main site, two off-site DR copies, and I’m using cheaper storage with larger mailboxes and only four copies of my data instead of the minimum of six I’d have with CCR+SCR (or the equivalent DAG setup) on RAID configurations.

Not so fast. Microsoft’s claims around DJS configurations usually talk about the up-front capital expenditures. There’s more to a solid design than just the up-front storage price tag, and even if the DJS solution does provide savings in your situation, that is only the start. You also need to think about the lifetime of your storage and all the operational costs. For instance, what happens when one of those 1:1 drives fails?

Well, if you bought a really cheap DAS array, your first indication will be when Exchange starts throwing errors and the active copy moves to one of the other replicas. (You are monitoring your Exchange servers, right?) More expensive DAS arrays usually directly let you know that a disk failed. Either way, you have to replace the disk. Again, with a cheap white-box array, you’re on your own to buy replacement disks, while a good DAS vendor will provide replacements within the warranty/maintenance period. Once the disk is replaced, you have to re-establish the database replica. This brings us to the wonderful manual process known as database reseeding, which is not only a manual task, but can take quite a significant amount of time – especially if you made use of archival mailboxes and stuffed that DJS configuration full of data. Let’s take a closer look at what this means to you.

[Begin 4/13 update]

There’s a dearth of hard information out there about what types of reseed throughputs we can achieve in the real world, and my initial version of this post where I assumed 20GB/hour as an “educated guess” earned me a bit of ribbing in some quarters. In my initial example, I said that if we can reseed 20GB of data per hour (from a local passive copy to avoid the I/O hit to the active copy), that’s 10 hours for a 200GB database, 30 hours for a 600GB database, or 60 hours –two and a half days! – for a 1.2 TB database[2].

According to the Top 10 Exchange Storage Myths post on the Exchange team blog, 20GB/hour is way too low; in their internal deployments, they’re seeing between 35-70GB per hour. How would these speeds affect reseed times in my examples above? Well, let’s look at Table 1:

Table 1: Example Exchange 2010 Mailbox Database reseed times

Database Size Reseed Throughput Reseed Time
200GB 20GB/hr 10 hours
200GB 35GB/hr 7 hours
200GB 50GB/hr 4 hours
200GB 70GB/hr 3 hours
600GB 20GB/hr 30 hours
600GB 35GB/hr 18 hours
600GB 50GB per hour 12 hours
600GB 70GB per hour 9 hours
1.2TB 20GB/hr 60 hours
1.2TB 35GB/hr 35 hours
1.2TB 50GB/hr 24 hours
1.2TB 70GB/hr 18 hours

As you can see, reseed time can be a key variable in a DJS design. In some cases, depending on your business needs, these times could make or break whether this is a good design. I’ve done some talking around and found out that reseed times in the field are all over the charts. I had several people talk to me at the MVP Summit and ask me under what conditions I’d seen 20GB/hour, as that was too high. Astrid McClean and Matt Gossage of Microsoft had a great discussion with me and obviously felt that 20GB/hour is way too low.

Since then, I’ve received a lot of feedback and like I said, it’s all over the map. However, I’ve yet to hear anyone outside of Microsoft publicly state a reseed throughput higher than 20GB/hour. What this says to me is that getting the proper network design in place to support a good reseed rate hasn’t been a big point in deployments so far, and that in order to make a DJS design work, this may need to be an additional consideration.

If your replication network is designed to handle the amount of traffic required for normal DAG replication and doesn’t have sufficient throughput to handle reseed operations, you may be hurting yourself in the unlikely event of suffering multiple simultaneous replica failures on the same mailbox database.

This is a bigger concern for shops that have a small tolerance for any given drive failure. In most environments, one of the unspoken effects of a DJS DAG design is that you are trading number of replicas – and database-level failover – for replica rebuild time. If you’re reduced from four replicas down to three, or three down to two during the time it takes to detect the disk failure, replace the disk, and complete the reseed, you’ll probably be okay with that taking a longer period of time as long as you have sufficient replicas.

All during the reseed time, you have one fewer replica of that database to protect you. If your business processes and requirements don’t give you that amount of leeway, you either have to design smaller databases (and waste the disk capacity, which brings us right back to the good old bad days of Exchange 2000/2003 storage design) or use RAID.

[End 4/13 update]

Now, with a RAID solution, we don’t have that same problem. We still have a RAID volume rebuild penalty, but that’s happening inside the disk shelf at the controller, not across our network between Exchange servers. And with a well-designed RAID solution such as generic RAID 10 (1+0) or NetApp’s RAID-DP, you can actually survive the loss of more disks at the same time. Plus, a RAID solution gives me the flexibility to populate my databases with smaller or larger mailboxes as I need, and aggregate out the capacity and performance across my disks and databases. Sure, I don’t get that nice 1:1 disk to database ratio, but I have a lot more administrative flexibility and can survive disk loss without automatically having to begin the reseed dance.

Don’t get me wrong – I’m wildly enthusiastic that I as an Exchange architect have the option of designing to JBOD configurations. I like having choices, because that helps me make the right decisions to meet my customers’ needs. And that, in the end, is the point of a well-designed Exchange deployment – to meet your needs. Not the needs of Microsoft, and not the needs of your storage or server vendors. While I’m fairly confident that starting with a default NetApp storage solution is the right choice for many of the environments I’ll be facing, I also know how to ask the questions that lead me to consider DJS instead. There’s still a place for RAID at the Exchange storage table.

In further installments over the next few months, I’ll begin to address the SATA vs. SAS/FC and DAS vs. SAN arguments as well. I’ll then try to wrap it up with a practical and realistic set of design examples that pull all the pieces together.

[1] RAID-1 (mirroring) and RAID-10 (striping and mirroring) both create two physical copies of the data. RAID-5 does not, but it allows the loss of a single drive failure — effectively giving you a virtual second copy of the data.

[2] Curious why picked these database sizes?  200GB is the recommended maximum size for Exchange 2007 (due to backup limitations), and 600GB/1.2TB are the realistic recommended maximums you can get from 1TB and 2TB disks today in a DJS replica-per-disk deployment; you need to leave room for the content index, transaction logs, and free space.

A Virtualization Metaphor

This is a rare kind of blog post for me, because I’m basically copying a discussion that rose from one of my Twitter/Facebook status updates earlier today:

I wish I could change the RAM, CPU configuration on running VMs in #VMWare and have the changes apply on next reboot.

This prompted one of my nieces, a lovely and intelligent young lady in high school, to ask me to say that in English.

I pondered just hand waving it, but I was loathe to do so. Like I said, she’s intelligent. I firmly believe that kids live up to your expectations; if you talk down to them and treat them like they’re dumb because that’s what you expect, they’re happy to be that way. On the other hand, if you expect them to be able to understand concepts with the proper explanations, even if they may not immediately grasp the fine points, I’ve found that kids are actually quite able to do so – better than many adults, truth be told.

So, this is my answer:

The physical machinery of computers is called hardware. The programs that run on them (Windows, games, etc.) is software.
VMware is software that allows you to create virtual machines. That is, instead of buying (for example) 10 computers to do different tasks and have most of them have unused memory and processor power, you buy one or two really beefy computers and run VMWare. That allows you to create a virtual machine in software, so those two computers become 10. I don’t have to buy quite as much hardware because each virtual machine only uses the resources it needs, leaving the rest for the other virtual machines.

However, one of the problems with VMWare currently is that if you find you’ve given a virtual machine too much memory or processor (or not enough), you have to shut it down, make the change, then start it back up. I want the software to be smart enough to take the change *now* and automatically apply it when it can, such as when the virtual machine is rebooting. For a physical computer, it makes sense — I have to power it down, crack the case open, put memory in, etc. — but for a virtual computer, it should be able to be done in software.

Think of it this way: hardware is like a closet. You can build a big closet or a small closet or a medium closet, but each closet holds a finite amount of stuff. Software is the stuff you put in the closet — clothes, shoes, linens, etc. You can dump a bunch of stuff into a big closet, but doing so makes it cluttered and hard to use. So if you use multiple smaller closets, you’re wasting space because you probably won’t fill every one exactly.

In this metaphor, virtualization is like a closet organizer system. You can add a clothing rod here to hang dresses and blouses on, and underneath that add a shelf or two for shoes, while to the side you have more shelves for pants and towels and other stuff. You waste a little bit of your closet space for the organizer, but you keep everything organized and clutter-free, which means you’re better off and take less time to keep everything up.

Of course, this metaphor fails on my original point, because it totally makes sense you have to take all the stuff off shelves before moving those shelves around. In the world of software, though, it doesn’t necessarily make sense — it’s just the right people didn’t think of it at the right time.

Clear?

I came close to busting out Visio and starting to diagram some of this. I decided not to.

Edit: I don’t have to diagram it! Thank you, Ikea, and your lovely KOMPLEMENT wardrobe organizer line!

Ikea KOMPLEMENT organizer as virtualization software

Busting the Exchange Trusted Subsystem Myth

It’s amazing what kind of disruption leaving your job, looking for a new job, and starting to get settled in to a new job can have on your routines. Like blogging. Who knew?

At any rate, I’m back with some cool Exchange blogging. I’ve been getting a chance to dive into a “All-Devin, All-Exchange, All The Time” groove and it’s been a lot of fun, some of the details of which I hope to be able to share with you in upcoming months. In the process, I’ve been building a brand new Exchange 2010 lab environment and ran smack into a myth that seems to be making the rounds among people who are deploying Exchange 2010. This myth gives bum advice for those of you who are deploying an Exchange 2010 DAG and not using an Exchange 2010 Hub Transport as your File Share Witness (FSW). I call it the Exchange Trusted Subsystem Myth, and the first hint of it I see seems to be on this blog post. However, that same advice seems to have gotten around the net, as evidenced by this almost word-for-word copy or this posting that links to the first one. Like many myths, this one is pernicious not because it’s completely wrong, but because it works even though it’s wrong.

If you follow the Exchange product group’s deployment assumptions, you’ll never run into the circumstance this myth addresses; the FSW is placed on an Exchange 2010 HT role in the organization. Although you can specify the FSW location (server and directory) or let Exchange pick a server and directory or you, the FSW share isn’t created during the configuration of the DAG (as documented by fellow Exchange MVP Elan Shudnow and the “Witness Server Requirements” section of the Planning for High Availability and Site Resilience TechNet topic). Since it’s being created on an Exchange server as the second member of the DAG is joined, Exchange has all the permissions it needs on the system to create the share. If you elect to put the share on a non-Exchange server, then Exchange doesn’t have permissions to do it. Hence the myth:

  1. Add the FSW server’s machine account to the Exchange Trusted Subsystem group.
  2. Add the Exchange Trusted Subsystem group to the FSW server’s local Administrators group.

The sad part is, only the second action is necessary. True, doing the above will make the FSW work, but it will also open a much wider hole in your security than you need or want. Let me show you from my shiny new lab! In this configuration, I have three Exchange systems: EX10MB01, EX10MB02, and EX10MB03. All three systems have the Mailbox, Client Access, and Hub Transport roles. Because of this, I want to put the FSW on a separate machine. I could have used a generic member server, but I specifically wanted to debunk the myth, so I picked my DC EX10DC01 with malice aforethought.

  • In Figure 1, I show adding the Exchange Trusted Subsystem group to the Builtin/Administrators group on EX10DC01. If this weren’t a domain controller, I could add it to the local Administrators group instead, but DCs require tinkering. [1]

ExTrSubSys-DC-AdminsGroup
Figure 1: Membership of the Builtin/Administrators group on EX10DC01

  • In Figure 2, I show the membership of the Builtin/Administrators group on EX10DC01. No funny business up my sleeve!

ExTrSubSys-Members
Figure 2: Membership of the Exchange Trusted Subsystem group

  • I now create the DAG object, specifying EX10DC01 as my FSW server and the C:\EX10DAG01 directory so we can see if it ever gets created (and when).
  • In Figure 3, I show the root of the C:\ drive on EX10DC01 after adding the second Exchange 2010 server to the DAG. Now, the directory and share are created, without requiring the server’s machine account to be added to the Exchange Trusted Subsystem group.

ExTrSubSys-FSWCreated
Figure 3: The FSW created on EX10DC01

I suspect that this bad advice came about through a combination of circumstances, including an improper understanding of Exchange caching of Active Directory information and when the FSW is actually created. However it came about, though, it needs to be stopped, because any administrator that configures their Exchange organization is opening a big fat hole in the Exchange security model.

So, why is adding the machine account to the Exchange Trusted Subsystem group a security hole? The answer lies in Exchange 2010’s shift to Role Based Access Control (RBAC). In previous versions of Exchange, you delegated permissions directly to Active Directory and Exchange objects, allowing users to perform actions directly from their security context. If they had the appropriate permissions, their actions succeeded.

In Exchange 2010 RBAC, this model goes away; you now delegate permissions by telling RBAC what options given groups, policies, or users can perform, then assigning group memberships or policies as needed. When the EMS cmdlets run, they do so as the local machine account; since the local machine is an Exchange 2010 server, this account has been added to the Exchange Trusted Subsystem group. This group has been delegated the appropriate access entries in Active Directory and Exchange databases objects, as described in the Understanding Split Permissions TechNet topic. For a comprehensive overview of RBAC and how all the pieces fit together, read the Understanding Role Based Access Control TechNet topic.

By improperly adding a non-Exchange server to this group, you’re now giving that server account the ability to read and change any Exchange-related object or property in Active Directory or Exchange databases. Obviously, this is a hole, especially given the relative ease with which one local administrator can get a command line prompt running as one of the local system accounts.

So please, do us all a favor: if you ever hear or see someone passing around this myth, please, link them here.

ExTrSubSys-Busted
Busted!

[1] Yes, it is also granting much broader permissions than necessary to make a DC the FSW node. Now the Exchange Trusted Subsystem group is a member of the Domain Admins group. This is probably not what you want to do, so really, don’t do this outside of a demo lab.

Windows 7 RC: The Switch

This weekend, I finally finished getting our desktop computers replaced. They’re older system that have been running Windows XP for a long time. I’d gotten newer hardware and had started building new systems, intending to put Vista Ultimate SP1 on them (so we could take advantage of domain memberships and Windows Media Center goodness with our Xboxes), but one thing led to another and they’ve been sitting forlornly on a shelf.

I must confess – I’m not a Vista fan. I grudgingly used it as the main OS on my work MacBook Pro for a while, but I never really warmed up to it. SP1, in my opinion, made it barely useable. There were some features about it I grew to like, but those were offset by a continued annoyance at how many clicks useful features had gotten buried behind.

So when I finally got busy getting these systems ready – thanks to Steph’s system suddenly forgetting how to talk to USB devices – I decided to use Windows 7 RC instead. What I’d seen of Windows 7 already made me believe that we’d have a much happier time with it. So far, I’d have to say that’s correct. Steph’s new machine was slightly tricky to install – the built-in network interface on the motherboard wasn’t recognized so I had to bootstrap with XP drivers – but otherwise, the whole experience has been flawless.

Want to try Windows 7 for yourself? Get it here.

One of my favorite experiences was migrating our files and settings from the old machines. Windows 7, like Vista and Server 2008 before it, includes the Easy Transfer Wizard. This wizard is the offspring of XP’s Files and Settings Transfer Wizard but has a lot more smarts built in. As a result, I was able to quickly and easily get all our files and settings moved over without a hitch. With the exception of a laptop, we’re now XP free in my house.

Today, I ran across this blog post detailing Seven Windows 7 Tips. There were a couple of them I had already figured out (2, 4, and partial 3), but I’ll be trying out the rest this evening!

And now, after the long break

Okay, okay…so updating my blog server took longer than I’d anticipated. Getting the old material out of Community Server into BlogML format turned out to be a lot easier than I’d thought and finding the time to get it all imported into WordPress wasn’t much harder. What tripped me up was getting all of the redirection for the old, legacy URLs working.

Community Server and WordPress store their content in very different ways, and so they generate the URLs for blog posts using different algorithms. I know there are a fairish number of links out there in blog land to various posts I’ve done, and for vanity sake, I’d rather not orphan those links to the dreaded 404 not found error. The solution was to find the time to buy the lastest edition of O’Reilly’s Apache Cookbook and bone up on the Apache web server directives.

So, I think all the relevant old URLs should now automatically redirect to their proper new places — there’s not much point in keeping all the old posts if you don’t do this. The nice thing, for those of you who are web geeks, is that I’m issuing permanent redirections so Google and other search engines will update their links as they re-trawl my web site, thus pointing to the new URLs. For those of you who are humans, you might want to take a minute to check your bookmarks and make sure they’re updated to the new links.

One note: some commenter data didn’t make the import successfully. I could probably dig into it and find out why, but frankly, at this point, it’s more important to get the site (and Steph’s blog) back up and running. So, sorry — if you were one of those commenters, I apologize. Future comments should be preserved properly, and I really don’t see moving away from WordPress anytime soon.

If you’re reading this, then the necessary DNS updates have finished rolling out and we’re back live to the world. Thanks for your patience!

Wanted: Your broken Mac mini

Life is full of synchronicity; most of the time, this is through the workings of chance, but every now and then we get to help it along. Two ships may pass in the night, but how often does the helmsman take a hand?

You’re the owner of a no-longer-working original PowerPC Mac mini. This awesome little piece of technology once rocked your world, but slowly, you moved on to bigger and better things. Maybe you upgraded; maybe it stopped working. This Mac mini, though, still hangs around, complete with a working SuperDrive. You may feel a bit of guilt over not passing it on or getting it refurbished.

I’m the owner of a proud original PowerPC Mac mini that is having problems with its SuperDrive. My mini wants to be a member of the OS X 10.4 generation but can’t boot from the internal drive, nor can it seem to find an external USB drive as a boot device.

If you’ve got a spare original Mac mini (or drive that fits) and you’re willing to part it with inexpensively, please drop me a line. No pina coladas or getting caught in the rain required.

The Facebook Experiment

Warning: the following post may not make much sense. If it does, it may sound bitter and arrogant. I apologize in advance; that’s not my goal here.

I finally got a critical mass of people dragging me into Facebook, so I’ve ben doing it over the last couple of months. I entered into it with a simple rule: as long as I knew someone or could figure out what context we shared, I’d accept friend requests. I only send friend requests to people I want to be in contact with, but if someone wants to keep up with me, I’ll happily approve the request. (Remember, Asperger’s Syndrome; I may be able to fake looking like I’m socially adjusted, but underneath, I’m not.)

This resolve has been sorely tested by a number of requests I’ve gotten from people from my high school days. I am not one of those people who thinks that high school was the best time of my life. Far from it, actually. Now that I understand about Asperger’s, I have been able to go back and identify what I was doing to contribute to my misery during those years — and boy was I — but I also know that there were a bunch of people who were happy to help. I was happy to leave that town, happy to never go back, and happy — for the most part — to not try to get back into some mythical BFF state with these people that I never shared in the first place. There are some exceptions; you should know who you are. If you aren’t sure and want to know, send me a private message and ask. Don’t ask, though, unless you’re ready to be told that you’re not.

Does this mean I want people to stop requesting? No. We’re adults. (At least, we should be.) Life moves on. I’m not that same person, and I’m willing to bet you’re not either. Let’s try to get to know one another as we are now, without presuming some deeper level of friendship than really exists. It’ll be a lot easier for everyone that way, and probably a lot more fun.

This is what I do for fun???

For the last three weeks, I’ve been on vacation.

Much of that vacation has consisted of quality Xbox 360 time, both by myself (Call of Duty: World at War for Christmas) and with Steph and Chris. (Alaric had a friend over today and we had a nice six-way Halo 3 match; the adults totally dominated the kids in team deathmatch, I might add.) However, I’d also slated doing some much-needed rebuilds on my network infrastructure here at home: migrating off of Exchange to a hosted email solution (still Exchange, just not a server *I* have to maintain), decommissioning old servers, renumbering my network, building a new firewall that can gracefully handle multiple Xbox 360s, building some new servers, and sorting through the tons of computer crap I have. All of this activity was aimed at reducing my footprint in the back room so we can unbury my desk and move Alaric’s turtle into the back room where she should have a quieter and warmer existence.

Yeah, well. Best laid plans. I’ve gotten a surprising amount of stuff done, even if I have taken over the dining room table for the week. (Gotta have room to sort out all that computer gear, y’know. Who knew I had that much cool stuff?) My progress, however, has slowed quite a bit the last couple of days as I ran into some unexpected network issues I had to work my butt off to resolve.

Except that now I think I just figured out the two causes. Combined, they made my “new” network totally unusable and masked each other in all sorts of weird and wonderful ways. It was rather reminiscent, actually, of the MCM hands-on lab. I guess I’ve been practicing for my retest.

Ah, well. I still have one day of freedom left before I head back to work. I might actually be ready to go.

So long, Exchange!

This holiday weekend, I finally accomplished a task I’ve been meaning to do for a while: I got rid of my email.

More precisely, I’m no longer hosting my email domains on my own server here in the house like I have been for the past eight years. I’ve finally made the switch to hosted email. With all of the free email domains out there, this may have been an easy choice, but Steph and I are not your run-of-the-mill email consumers. We’ve gotten used to having the calendar and scheduling features of Exchange and Outlook here at the house, so it was pretty clear I needed a hosted Exchange solution.

Last night, I flipped the switch — I double-checked that all of our domains and email addresses were configured and then changed our MX records to point at the new service. (An MX record (Mail eXchanger) is an entry in DNS that tells the rest of the Internet who to send your email to. Almost all email systems use at least one of these records.). As a result, some time early this morning all mail to us started going to our mailboxes on the new provider. Over the next couple of days, we’ll be transferring our existing messages up to the new mailboxes and shutting down my trusty Exchange 2007 server here at the house.

Actually, I’ll be recycling the hardware — it’s one of my beefier servers, and I can use it to do some other tasks around here and upgrade some of my low-end machines. This helps me consolidate servers, shut down more boxes, get rid of more clutter, and lower my bills. It also means that we no longer need to have our current DSL line and static IP address; we can explore newer, faster options that will better fit a household with multiple Xbox consoles. It also helps de-clutter my time; running a healthy email server takes time that I wasn’t putting into it here. (I shamefully confess that I went to run backups on my email a couple of weeks ago and discovered, rather to my horror, that it had been over a year since I’d last done so!) Now I don’t have to worry about those tasks. I also don’t have to worry about spam; the hosted service includes a really decent anti-spam service (the same one we use at work).

Still, after a decade of being responsible for managing my own email services (eight years running thecabal.org here at home, plus another couple of years being a sysadmin at an ISP), it feels rather strange to no longer be able to put my hands on the physical box hosting my email.

After slight technical difficulties and a simple but complicated operation, the patient has recovered

I don’t remember which day it was two weeks ago that I discovered that my web server was no longer accepting queries, but I do remember the distinct annoyance I felt when I got home from work, made my way through the back room to the computer rack, logged on to the management console, and saw that the server was powered off.

That’s nothing to the irritation I felt ten seconds later when it wouldn’t power back up.

This server is an oddity in my collection; it’s not the standard desktop/server size for motherboards and power supplies. As it turned out through some testing a couple of days later (when I found some free moments), the power supply had given up the ghost. Unfortunately, I didn’t have another power supply that would fit into that particular case, nor could I locate one in the local area.

So, today I got to do a motherboard-ectomy. For the uninitiated, that’s where you take the contents of one computer (in this case, the motherboard and the disk drive) and transplant them into a new case. It’s a relatively straightforward process, just long and (usually) cramped in a couple of places. This process was actually simpler than normal — the web server motherboard is so much smaller than a regular one that the usual cramped space problems didn’t happen — but was complicated in other ways by the need to jury-rig a couple of things in place (very minor tweaks; the delay was more from finding the right pieces to do things as close to The Right Way as possible).

However, it all went well — and as you can now see, the web server is now back up and running. With some of the changes coming in my network in the next month, this is a temporary measure, but at least Steph and I can blog again.

Silly

Steph alerted me to the existence of a very cool product — a dock for Macbook Pro laptops that stands them vertically. This has two advantages: saves desktop space and promotes better cooling. Macbook Pro machines are industrious heat generators and you have to be really careful about what kind of surface you leave them on. I’ve found that mine will shut down or have BSODs (when running Windows) if I have it flat on a plastic or formica surface; wood seems to be okay. The best bet, though, is to put it on a little stand that elevates the rear of the laptop and allows air to cool the underside. So, yes, I was really keen to check this out.


What I found, however, was this dreck. I have to admit that it’s a very sweet piece of machined aluminum; very pretty, matches the look of the Macbook Pro. However, this is not a dock; it’s a stand with delusions of $305 grandeur. This is a dock; note the integrated plugs. That handy little lever at the top moves all of the plugs into matching position on the sides of the Macbook, allowing you to quickly and easily put the laptop in place and make connections (and here’s the important bit) without having to manually plug and unplug all of the various cables you’re using every time you put the laptop in or take it out. It’s not as sexy, and it takes up more desk room (quite a bit more, which is one of the reasons I don’t have mine actually in use yet), but it’s functional.


That other block of aluminum? That’s designer silliness. Sadly, I bet far too many Mac folks will fall for it.


Edit: okay, looking closely at the Balmuda page shows that they never call it a dock, simply a stand. Kudos for them — however, it makes the price tag even more mind-boggling. Razzes to Apartment Therapy Unplugged for the mis-identification.

Am I hot or not?

Stupid website, but it gives me a chance to taunt my co-worker Kevin. This morning I got a puzzled e-mail from him, asking me why this picture of me in Sydney from February (yes, that’s Sydney, Australia; we were there for training for work) was the most-viewed picture in his online galleries (warning, probably not a worksafe gallery). I have no clue, but I think it’s damned funny.


Kevin’s a hard-core picture nerd; he’s got a wireless card for his digital cameras that will automatically use any nearby open WiFi connection to upload pictures to his Web gallery. This means that on a trip he’s usually got pictures uploaded before he gets back to his hotel, let alone before he gets home. That’s pretty cool, even if (like me) you aren’t inclined to take gigabytes of pictures everywhere you go.

A few thoughts on email

Email clients need to be more intelligent. For example, I can appreciate the Request Read Receipt feature that Outlook/Exchange and other email systems offer; it makes sense in a corporate environment, or when sending correspondence with business partners. However, all bets are off once you starting emailing the Internet in general. Why, oh why, do Outlook and Exchange continue to be so clueless about these wonderful things we call mailing lists?

It wouldn’t be very hard at all for Outlook to notice when a message I’ve received comes from a real mailing list; they have all sorts of wonderful headers (at least, they do if they’re compliant with RFCs) that easily distinguish them. It should then automatically change its behavior in several key ways:

  1. Stop sending read receipt requests to that address. It’s really bloody annoying to be reading along a mailing list and suddenly get the read receipt request dialog in my face, and all it does is make me think that the sender is an idiot.
  2. Stop sending OOF (out of office/out of facility) messages to that address. That looks even dumber.
  3. Offer to automatically create a new folder and rule to manage future messages from this list.

Oh, and email users who set “request read receipt” as their default? Should. Be. Shot. 

Dear iPod

Dear iPod,


Over the years that I’ve had you (as your second owner), we’ve had our rocky times. You’ve worked well with both my Windows and Mac workstations — that’s a plus. Your battery life is damn near useless (and I understand that’s not really your fault), but with the appropriate adapter therapy we’ve been able to work around that. I hardly ever use you with headphones, but that iTrip is a righteous score that allows you to rock the car, the house, and any other FM radio within distance. True, you’re only a 3G classic model, but you’ve got 40GB and I’ve never even come close to running you out of space. All in all, we’ve been good for each other. Today, however, was something entirely different.


I now, of course, realize that you picking Bon Jovi’s It’s My Life when I was driving home through Woodinville was really a message. But I didn’t get that message until after we got on to 522 through the funeral procession and slowly drove by the column of funeral-goers. Just as we drew even with the hearse, you switched to Chumbawumba’s Tubthumping. Specifically, you blared the following line out the open windows:


I get knocked down, but I get up again
You’re never going to keep me down.


That, dear iPod? Total awesome.


I was thinking about getting a newer model, but now? Now we’ll see what we can do to replace that no-good battery of yours. You’ve still got years of life left in you with just a little TLC from me. You, iPod, rock.


Love,


Devin.