Sunday, November 21, 2004

Million Monkey Manifesto

Is the war over? Have we sufficiently codified the method and the medium? Has the software “client” become so ubiquitous and capable that we can finally detach ourselves from the complicated mess of software design and instead focus on using our software to better communicate? I recently read Adam Bosworth’s talk from ISCOCO4 and felt a bit underrepresented. To grossly paraphrase; “it’s the data stupid”. This soliloquy seems to have always echoed throughout the halls of edificial corporate computing but to me, it never really rang true, mostly because I was too busy writing code to manipulate the data to be afforded the luxury of actually thinking about the data. Adam’s comments cause me to pause and consider my livelihood as a software engineer and question its relevancy, or rather, the business model upon which its relevancy is based.

To clarify, the DC Comics version of the battle has Bizzaro, your everyday internet user (hopefully, not be confused with the “tragic commons” but a shaping force none the less) pitted against our archetypical protagonist Superman, the systems architect positioned at your local custodial internet transformation factory. Bizarro doesn’t really care how he gets things done; “… is big crime to make anything perfect in Bizarro World”. Superman seeks order and justice in society. They battle, mostly in the real world, over order and chaos with the result in question being Mosaic, the first highly visible internet browser. This invention could acquire data from the vast entropy of interconnected computers, and, with the aid of some human readable self describing meta data, would render the data in a window on the computer screen however the publisher meant the data to be seen. Simple, ordered, tangible, and linked by reference to related information. Problem solved. Or is it? I mean, Bizarro still seems to burn Superman’s biscuits every now and again.

Let’s attempt to put this in perspective. When we talk about the internet and computing we are talking mostly about the “…need to store information and transmit information outside of human memory and over time and over space.” The previous statement was made by Dr. Holly Pittman, the director of the Center for Ancient Studies at the University of Pennsylvania with the context being the origins of the first written word. What’s most ironic to me is not that we can directly correlate the crux of the modern internet with the origins of the written word but instead, according to Dr. Peter Damerow of the Max Plank Institute for the History of Science, the first Proto-cuneiform writings were “severely restricted” and not really sentences and narrative but more lists and categories. Why? Because the first written texts were auditing records for the purposes of economic administration. Alas, the body corporate has invented written communication and to this day continues to prescribe its attendant faculty on its evolution through the digital medium.

Which brings me to the title of this post. There is a parable that is used to illustrate probability which suggests that given a million monkeys (actually I think the term was “infinite monkeys” but a million has commas and suggests possibility) and a million typewriters that eventually they will produce the works of Shakespeare. In economic terms, if the works of Shakespeare are valuable in the marketplace and monkeys are cheap and typewriters are cheap then I’d say we’ve got a mission statement. To be more concrete, technologies such as SOAP, XML-RPC, DCOM, DCE, IIOP, CORBA, etc.. have been solving corporate problems for a long time because large corporations have the manpower, the capital, and the motivation to solve such problems. Efficiency, clarity, and malleability are collateral damage and not essential to the ultimate goal. This is not interesting to me.

Why did HTML succeed? I propose that it’s success is insignificant and probably would have been sufficiently replaced any one of a number of competing document description languages had the timing been slightly different. The real jewels of the internet crown are TCP/IP, DNS, and the W3 consortium. Without the simple ability to translate a human readable resource identifier using a guaranteed persistent name database into some non-ambiguous network host address and then be routed, in an efficient way, to that host computer you would not be reading this. Before this technology the most popular forms of network communication consisted of tossing all the data into the air (or rather “ether”) and hoping it would land on your coworker’s desk. Either this or you put your data on the corner of your desk and waited patiently for the mail boy to make his rounds of every employee in the company, pick up your data and hand deliver it. Obviously neither were scalable solutions. I suggest that the inventors of the internet were not so much concerned about markup standards like HTML, aggregation standards from T-SQL to RSS, or even standards governing how voice, music, and video could be transmitted but rather they were mostly concerned with solving the scalability problem. This raises the question of what problems are we really trying to solve?
  • Why is it so easy to interact with a web log but so difficult to write your own web logging system? Specifically, Tim Berners-Lee originally spec’d the internet to be both readable and writable yet, to date, it is barely possibile. (WebDAV / DeltaV are trying).

  • Why do the state of the art development tools continue to push Neolithic user interface controls which barely resemble those used in the modern web browser? (automatic layout, arbitrary clipping, and embedded frames come to mind, with thoughts of XUL, XAML, and WHAT to ponder)

  • Why do we easily accept the declarative nature of spreadsheet design as the ultimate modern expression of human-machine interface yet toss the baby out with the bathwater with DHTML or completely imperative systems? (Laszlo and Margaret Burnett seem to be looking into this)

  • If it is all about content and aggregation why do the majority of web designers understand JavaScript? And since when did typing in an “url”, clicking a “combobox”, and selecting a “radiobutton” become rooted in the lingua franca of Amazon-surfing grandmothers across the planet? The inverse (and more interesting question) is how does the way in which shared data is rendered in a browser affect the authoring process and even the content itself?
In the prosaic words* of someone I respect a great deal who commented recently on the state of the internet and computing, “pretty much everything that’s believed is bullshit”.

In closing I should return to the relevancy of software engineering and consider that when proposing a new state of the art that perhaps we are defined by tragedy. Some may pick the stock market collapse and choose to build more robust information systems. Some may take the nasty oppressiveness and historical revision of the totalitarian state by building efficient centralized information cross indexing systems (along with a dash of capitalistic ad placement). For my tragedy I choose the destruction of the Library of Alexandria. This library contained the collected works of Euclid, Ptolemy, Aristotle, Sophocles, and Pythagoras to name a few. Because information could only be recorded by well educated scribes and because the printing press was centuries away when the library burned down almost all the information was lost forever. Some say that only about ten percent of the ancient works survived which then became the foundation for modern philosophy, geometry, and medicine. I sometimes try to imagine how much time we are making up. Recently, a set of server-based tools has become popular in easing the pain of web log creation. The title chosen for this tool is Moveable Type. I understand where they’re coming from. The effect is not lost.