I’m not quite sure how to start this post, so I’ll start with the punch line and work back from them: I’ve taken a job with MSN in Shanghai and I am moving to China at the end of this year! This probably isn’t news to a lot of my readers, but, yep, I’m moving. I’m leaving San Francisco and the Mountain View Hotmail team to go work in China. I’ll still be working on Hotmail but in a different role, helping to build up the growing team that we have there already. We’re doing lots of really interesting things in Shanghai and I’m excited to be able to take part in it. I wanted to work overseas for a few years and being able to do it in our own team and with Microsoft made a lot of sense for me. It’s going to be a huge change to my life, lifesytle and career, but one I’m looking forward and that I’m excited about.
Late last year when we were developing FireAnt after some time of thinking through the plumbing, the programming model and the development paradigm we came to an interesting dilemma: what would the over the wire format be? Our basic goal was to send only data (or as close to only data as possible) over the wire to the end user and make the development model easy (e.g. no hand written spaghetti code). While that sounds simple enough, the requirement leads to a bunch of possibilities on how to serialize the data for transport via XmlHttp.
And we knew from lots of performance work we’d done over the course of 2004 what kind of connections, speeds, networks, and bandwidth were available to our customers all over the world. We’d done field tests in China, India, and other parts of Asia and Europe, so we had a pretty good idea of how bad internet connections out there are (1s+ latency, 25%+ packet loss and worse). We also knew that our current (called Wave 10) product didn’t fare too well in those scenarios due to the raw amount of data we were sending across the wire so a goal of Kahuna and FireAnt was to reduce it.
The first, and most obvious, was to send the data in XML. It’s called XmlHttp after all, right? The second was to just use SOAP. ASP.NET makes SOAP services easy to create. The third and last was to create our own over the wire protocol. If you look under the hood, you’ll see we’ve done the latter and I’ll try to explain why here.
Before getting in to a discussion of each format, one distinction that I want to make is the difference in the upstream (client to server) and downstream (server to client) formats. They both have very different requirements. You have a (fairly) limited programming model on the client that’s cooperatively multithreaded, so doing a lot of processing on it can get expensive. You also have a sophisticated server with a sophisticated programming language and (for all intents and purposes) unlimited processing resources.
All of the above possible on the wire formats will work perfectly well, but in the context of the goals that we had, building out own worked the best.
As you can see if we had chosen XML and SOAP we would have had to do a lot of work to make it work seamlessly with our development model. Further with SOAP, it’s a pretty heavy protocol when dealing with lightweight data (e.g. a GUID, or a command acknowledgement). In building FPP, we basically decided to split the representation of data on the way up versus the way down to hit the goal of reducing the amount of data over the wire. We then generate all the glue that held the client and the server together to make the development model straightforward.
I have a bunch of friends who blog on Xanga, and they’ve finally (not exactly sure when) started returning properly formatted RSS. Before they made the change, any time a user would put a character like a “?” in their blog post, my feed reader would blow up on the feed and for the next 10 posts I wouldn’t be able to read their content. Now it’s all good and I’m up to date with everybody who’s on Xanga.
Any user of Microsoft software has seen that now ubiquitous prompt when something goes wrong, “Software X encountered a problem, would you like to report this to Microsoft?” That prompt is part of a system called Watson, named after the detective of Sherlock Holmes fame. When a user clicks “Yes” on the report dialog information identifying the fault is sent back to Microsoft where the product teams can use the data to analysis what are the most common bugs and the most important to fix.
In the M3 release of the Kahuna Mail Beta, we launched Watson support for web applications. It’s already paid off huge dividends for us to help identify what bugs to fix first in our M4 release. When our servers or our clients have an error, we display an inline message to the user that says we had an issue with their request and if they’d like to report the error.
If the error happens on the server, we encrypt the error (to prevent any personal data from leaking) and send it back down to the user to allow them to click “Report it”. If the error was on the client, we user encrypt it on the client and then allow the user to click “Report it”. If the user decides to report the error that encrypted blob is sent back up to the server where we decrypt the data, remove any personally identifiable data from it, and then send it to the Watson data warehouse.
Once it’s in Watson, we’re able to mine the data for specific sections of code that cause issue. The dumps provide us with the source file that failed along with the line and stack trace. Watson builds buckets for each unique combination of those items and then tracks hits in to the bucket. Basically, when we look to fix bugs we can find which ones our users hit the most and attack them first.
To give you an example of the kind of bugs we fixed, when we first launched M3 we started to receive a number of hits in our MIME parser when it was being used to render a message. We were able to track down the lines of code that were causing the issue (we weren’t handling mis-encoded MIME message correctly) and release an update of Kahuna which in turn made that issue disappear from the hit list.
I’m testing Omar’s Thread Killer Outlook add-in and it’s making a world of a difference in my inbox triage. Normally I spend some good percentage of the day dealing with threads that I’m only tangentally interested in. I’d either delete them or file them as new mails on the thread flow to my inbox.
Now when I see a thread that I know I’m not interested or needed in, I simply click on any message in the thread and click the “Kill Thread” button in my toolbar and like magic, I never see it again. Since I’ve started using it yesterday I’ve noticed the amount of mail in my inbox drop and I’m seeing much more relevant mail in my Inbox folder now. I’ve killed about 25 threads, which has saved me from 72 pieces of mail ever hitting my inbox in about 24 hours.
As Windows Live Mail Beta comes to life today, there’s a lot of cordination that has to happen across all the live properties. To help make our small slice of it happen, our team has been gathered in our Flame conference room all morning and in to the afternoon, laptops in tow to do the work required to ship this beta. We call this type of meeting a war room: it’s basically a place where all stakeholders gather and are available during the day to ensure quick action and that things happen on time.
On the projector on the wall is the list of events that happend today, scheduled down to to the 5-minute level, with owner and current status. In the room are members of the QA team, dev team, release engineering team, operations team, and program managment team. Each team has some some job or task that’s required to execute on the plan (from pushing configuration, to verifying code, to debugging machines, etc). Also in the middle of the room is a Polycom phone bridge that’s set up to connect the war room with the Service Operations Center (SOC) in Redmond, along with anybody else who wanted to call in during the day.
Also as the picture shows, there’s plenty of food, drinks and candy to fuel the forces through the day. 🙂
As the press has started covering, we’re launching the next evolution of Microsoft services under the umbrella of “Live” services. Today you’ll be able to preview the beta of Windows Live by checking out some of the following services:
- www.live.com – our customizable portal, may remind you a bit of start.com
- ideas.live.com – a central site where you can participate and register for Windows Live Betas
- safety.live.com – a free full service tool to ensure the health of your PC
- mail.live.com – users of the Mail Beta may recognize this as our Kahuna release of web mail
- favorites.live.com – our roaming favorites service (which happens to use FireAnt, the underlying technology of Kahuna)
And this is only the start! During the press event today, we demoed these very cool services, all which will be in your hands over the next days, weeks and months.