Dave Thomas (not the hamburger guy, but the programmer) always has a knack for expressing great ideas in a terse amount of text. I constantly refer back to “The Pragmatic Programmer” to sharpen my skill set. Below is a short excerpt of an idea he expresses in Chapter 3 that might change the way you think. I am paraphrasing a lot of the content from his book below:
“As [Programmers], our base material isn’t wood or iron, its knowledge. … And we believe that the best format for storing knowledge persistently is plain text. With plain text we give ourselves the ability to manipulate knowledge, both manually and programmatically, using virtually every tool at our disposal…
Suppose you want to store a property called uses_menus that can be either TRUE or FALSE. Using text you might write this as
… contrast this with
… The problem with binary formats is that the context necessary to understand the data is separate from the data itself. You are artificially divorcing the data from its meaning. The data may as well be encrypted; it is absolutely meaningless without the application logic to parse it. With plain text, however, you can achieve a self-describing data stream that is independent of the application that created it…
There are two major drawbacks to using plain text: (1) It may take more space to store than a compressed binary format, and (2) it may be computationally more expensive to interpret and process a plain text file.”
Now, think about the direction that computers are heading – faster processing and more storage space for less money. Are either of these two disadvantages crippling? I think we are looking at the death of compiled data – it simply will serve no purpose within a decade or so.
“What are the benefits?
- Insurance against obsolescence
- Easier testing…
Human-readable forms of data, and self-describing data will outlive all other forms of data and the applications that created them. Period. As long as the data survives, you will have a chance to be able to use it-potentially long after the original application that wrote it is defunct..
Consider a data file from some legacy system (all software is legacy as soon as its written) that you are given. You know little about the original application; all thats important to you is that it maintained a list of client’ Social Security numbers, which you need to find and extract. Among the data you see
But imagine if the file had been formatted this way instead:
You may not have recognized the significance of the numbers quite as easily.”
This seems to whisper the heated debate of OOXML, and whether or not the data format is truly open. People have a right to their own data without the worries of vendor lock-in. It really is quite evil. Think of Microsoft Office as a software platform that takes your ideas, then encrypts them so that you have to pay money to access them again. While this isn’t the product description that Microsoft is likely to adopt, it is with a different spin, exactly what is occurring. So think twice about OOXML before you opt-in. Dave really pulls this together below:
“Unix is famous for being designed around the philosophy of small, sharp tools, each intended to do one thing well. This philosophy is enabled by using a common underlying format-the line-oriented plain text file…
If you use plain text to create synthetic [data], then it is a simple matter to add, update, or modify the [data] without having to [use] any special tools to do so…
Even in the future of XML-based intelligent agents that travel the wild and dangerous Internet autonomously, negotiating data interchange among themselves, the ubiquitous text file will still be there. In fact, in heterogeneous environments the advantages of plain text can outweigh all of the drawbacks. You need to ensure that all parties can communicate using a common standard. Plain text is that standard.”