Naked XML
If you follow Tim Ewald’s blog, you’ll know that he is religious about XML and run-time typing. If Dare Obasanjo is the Zen priest of XML, Tim Ewald is the Pentecostal evangelist(I am, naturally, an XML Zionist — Software is Microsoft’s birthright, and XML her manifest destiny :-)).
Tim is a harbinger, IMO, of where the whole Object-Orientedcommunity is moving. And he is proof that XML is the great peacemaker, stimulating proponents of competing data models and programming models to learn from one another. And the lesson that Object-Oriented (and related)programmers are learning from XML is that semistructured data and programming are beautiful.
Now, professed reverence toward XML is not proof that someone has apprehended the true beauty of semistructured data model. I have a litmus test of sorts that I use to determine if someone has “got it”. I show them an XPath like “//contact[.//fax]” and watch their faces. Of the people who understand what it does, most will have no reaction, and most of the rest (the experts) will raise their brows skeptically and say “only a stupid person would write such an inefficient query!”. There are yet precious few who exclaim “that is how things should be!” as their faces light up.
The lesson, of course, is that real-world information is chaotic. In any but the smallest “proof of concept” systems, the best that one can hope for is to be able torecognize small pockets of structure within a sea of otherwise unstructured information. People in the VLDB, data warehousing, and ETL communities have long realized that it is folly to tightly bind data tuples and relationships into restrictive schemas. Boyce-Codd normalization rules maintained flexibility bypreserving a distinctionbetween relations and tuples, but even these rules are toobinding for many VLDB applications.
But while flexibility is important for complex systems, complete lack of semantics is useless. The real goal is somewhere between strong-typed and untyped — to provide structure when and where you need it, while protecting your right to ignore the rest. The first paper that clarified this idea for me was Peter Buneman’s discussion of dynamic typing for semistructured data.
The last decade has seen a great deal of research into semi-structured data access, some of it quite pragmatic and immediately useful in real-world data management problems (e.g. Florescu). Simultaneously, others researched programming models based on semistructured data (e.g. Meijer). Researchers like Meijer and Florescu (and the rest of thedominant diaspora of researchersoriginally fromUniversity of Bucharest) did not start with XML, to be sure, but they quickly recognized XML as the first semistructured data format with the power to go mainstream.
On the other hand, XML has been pulled in many directions from the start, and has failed to provide a clean and consistentdata model. The nattering data model issues have severely slowed adoption of XML for use in semistructured data access and even object serialization. So while XML has become mainstream as an easy-to-parse text format for interop scenarios, theprogramming and data access models have not really been able to take advantage of this trend in the ways that Tim Ewald envisions.
But, there is reason for optimism. The industry appears to be coalescing around a common understanding of what the XML data model is. My Grandmother will probably never care about this, but to me it makes the future seem rosier.