glass houses
Glass Houses - At Microsoft we have a bunch of servers from which we can watch on-demand recordings of various conferences, presentations, and so on. This makes it convenient to see what things a visiting researcher said, share lessons-learned from major projects, and so on. For example, the Office team has given some talks about running software development projects that are still useful years later. One topic that has appeared in recent years is running a software service. Just like everyone else in the industry, Microsoft has been learning how to run online “mega services”. Some people will recognize that the telephone companies and cable television companies have already got a lot of experience providing services that are based on software. But even telephone and cable service is fairly unreliable. The idea that people see damage and “route around it” is about as Internet as it gets. “My DSL line doesn’t work? Oh well, I’ll just get some food and it will be working when I get back.” But the really disturbing question is why there is so much damage to route around in the first place? Why do our DSL, Cable, Telephone, eBay, MSN Messenger, and other services break so often? We all know at least one crufty old-timer who will make the claim that his operating system or tool is the only way to build a service that runs. But software is just one small part of the puzzle, and no software is free from association with high-profile outages. Hardware is another small part of the equation. The rest comes down to planning, discipline, and organization. So why do we suck so bad at this? And what can companies do to improve?