OOXML: Who Cares?

For those unfortunate few who have just experienced Comcast’s vision of a tiered-Internet turn reality, you might want to pro actively take note on what else is happening in the world of technology. The infamous Microsoft Office file formats, “.doc, .ppt, .xls, .mdb” were created in 1996. These formats are over a decade old, and are still shaping the technology landscape at this very minute. OOXML is the successor to the file format throne, and will probably have a major impact for the next decade.

My interests have recently been peaked when I looked at some real information, instead of FUD from both sides. Consider some of the points made by Stéphane Rodriguez below:

  1. OOXML file formats have to be completely refactored when any change is made. This is the way that the current binary formats work, not XML. Failure to do so results in “catastrophic failure” when reopening the file.
  2. Number values are not stored in XML as entered. Instead, the OS’s treatment of floating numbers dictates how the data will be entered in the file format. This is inconsistent from platform to platform. There is no documentation on how this is to be handled from Microsoft at this point in time.
  3. VML is a subset of the OOXML file format, but is not defined anywhere in the specifications. VML is proprietary, and closed source. Its use in Office 2007 can be seen in the use of comments, and charting, despite Microsoft’s repeated claims that it is long dead and deprecated.
  4. The packaging of the file’s inner parts seems to be obfuscated. There is no relation between the physical hierarchy and its logical layout.
  5. All locale specific data (ie date formats) are stored in the US locale. The OOXML file format does not disclose how to translate this into other locals because it is outside the scope of the file format, instead relying on Office to do the conversion.
  6. OOXML has to be backwards-compatible with over 10 years of legacy code. Instead of leaving this baggage at the door, it has become incorporated into the OOXML file format and has to be accounted for by competing applications. This is a high cost for reading and writing even simple files.
  7. Encryption of OOXML documents changes the file structure from a ZIP format to an undocumented OLE format. No documentation is provided as to how this is encrypted / decrypted. (Yes, even if an encryption system is documented, it is still safe)
  8. No documentation for either the Escher library (pre Office 2007) for chart drawing, or the newer VML based chart drawing library in Office 2007. Also, read #3 again.

These claims are verifiable, with instructions for how to repeat these scenarios / conclusions on his website. My advise?

Follow the Money

Microsoft has invented Office 2007 around this new file format, so their cost of development was the least expensive – they made this up as they went. How much would it cost other companies to duplicate the results of Microsoft Office by retooling existing software? The answer is too much to be worthwhile, which is what Microsoft wants.

Technology is still in its infancy, with worlds of change seen between Office 1997 to Office 2007. There is so much to document, that Microsoft’s own specifications take up a whopping 6,000 pages. Now, at the exponential rate at which technology is growing, how hard will it be to pull out of a proprietary mess 10 years from now? Will it take 35,000+ pages to document what is happening behind the scenes? It is certainly possible. Now that I have some concrete information, its time to put up the obligatory open source propaganda.

ODF button


Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.