Precis

XML, like Mozart, is too easy for beginners, and too difficult for experts. There is a strong temptation for overspecification.

Exposition

XML makes it easy to design standards that overspecify data. Complex systems are inherently prone to failure when subjected to unpredicted stimuli, and overspecified complex systems can fail all by themselves. To create a data model to describe all eventualities requires you either to be omniscient or have unlimited time for programming. It is better to lower your standards than to falsely believe you will succeed at perfection.

Example

How would you mark describe an address? Here's how successful companies do it:
<address>1200 E California Blvd, Pasadena CA 91125</address>

For a person, this is a completely legitimate address, and MapQuest can find it.

Invariably, the novice comes up with something like this
<address>
<number>1200</number>
<street>E California Blvd<street>
<city>Pasadena</city>
<state>CA</state>
<zip>91125</zip>
</address>

OK, that looks fine. But what happens when you want to mail something to me at Caltech? My address is "Caltech 103-33, Pasadena CA 91125". That 103-33 isn't a street or apartment number. I constantly run into problems when somebody's software can't understand what this means. That's because the post office uses a hierarchical sorting system. All they have to do is get it to 91125, and a local expert will know what to do with it. The overconstrained data model, on the other hand, requires the software to know about every delivery system everywhere.

But that's a small problem, you say. It wasn't until version 6 that the Great Plains accounting software had a "Country" field. Before that, companies had to go to great lengths to work around their billing systems to be able to sell internationally. That seems really stupid to me.

Oh, so we'll just add a Country field. But what if you want to ship to France? There, the postal code precedes the city. Fine, you say, we'll check to see if the country is France, and handle it there. But do you want to do that for every nation in the world? What if you want to print an ad-hoc mailing list? Will you have to reimplement the algorithm each time? And what if you make a mistake? Then someone will enter data to make it print right on one system, which will ensure that it won't print right on another.

Overspecification is a recipe for disaster!