Binary XML

Miguel comments on the “Binary XML” postings from Omri and Dare, pointing out that only two standards would probably be needed (one for size, one for speed) to cover the majority of scenarios. I think this is correct, but in my opinion it’s not the number of encodings that is a problem, but simply the existence of any “standard” encoding beyond XML 1.0.


If you can remember just five short years ago, itwas oncea major decision for IT developers to choose what encoding to persist and send their data:



  • Should it be fixed-width or delimited?

  • Should it be delimited with tabs or commas? What about quotes?

  • Should it be binary or text? ASN.1? DXF? IGES?

Every system used a different encoding technique, and every time you wanted to interop you had to write a parser. Most of ushave written at least a fewparsers for formats like IGES, W3C Log File, and so on. How much money was wasted by people writing parsers?


Now fast-forward to 2003. When a system developer thinks about persisting and sharing data, she automatically thinks “XML”. In 90% of cases, XML is the obvious choice and no debate occurs. Do you think that this happens because XML is a superior format based on size, speed, or any othe technical criteria compared to the options available in 1998? Of course not! XML is the obvious choice because programmers are lazy, many parsers are freely available, and it’s “good enough” for most uses. The fact that XML is ubiquitous leads to plenty of parsing options being available, and more parsing options and tools leads to greater ubiquity. Developers can use XML in most cases and be confident that everyone else in the world will be able to parse out their data with trivial effort. Developers can argue about data schemas now instead of wasting time bickering about parser code and syntaxes. This is a huge contribution!


The thing that many people fail to understand, though, is that none of this virtuous cycle could exist if XML parsers were not trustworthy. XML depends on the fact that well-formed XML can be processed by any parser, and non-wellformed XML can be processed by none. People deploy XML because they know it will “just work” no matter which parser is being used. People deploy XML because they know it will work no matter whether it is IBM or Microsoft in favor that week. Nothing about XML matters more than this promise matters.


So, consider what happens when we introduce some new encodings which are not wellformed XML 1.0, but we call them “XML” anyway. When Jane in the IT department configures her EDI software to send an “XML” file to a partner, and the partner’s machine rejects it, who is to blame? Jane will claim that “my vendor says that XML 1.0bin is a W3C spec, so your vendor is non-standard”, while the partner will claim “my vendor accepts XML 1.0 so your vendor is non-standard”. In fact, it is quite likely that vendors with multiple XML-enabled products would end up in situations where their own products failed to communicate with one another. Note that this danger exists with any variations from XML 1.0, and not just “binary XML”.


Reasonable people might argue that this is OK, and that IT pros will simply have to learn to distinguish between the four different incompatible types of XML (XML 1.0, XML 1.1, XMLfast, XMLsmall) and will have to manage the compatibility mismatches between all of their systems. But that starts to look a lot like 1998 to me. Developers will bicker about which XML to use, and will have to switch parsers based on the choice of data format. Systems will have to offer and consume multiple formats and negotiate formats between one another. I have a good memory, and I remember how badly things used to suck. Having a solid, reliable “obvious choice” like XML 1.0 means freedom from pain for millions of developers. Let’s please don’t mess with that too hastily.

Leave a Reply