Using a great snowclone, I hereby declare that the semantic web is dead. In the same breath, I also declare that it is alive, well, and much further underway that you might imagine.
To understand what I’m talking about, let’s first take a look at the definition for “semantic web“:
The semantic web is an evolving extension of the World Wide Web in which web content can be expressed not only in natural language, but also in a format that can be read and used by software agents, thus permitting them to find, share and integrate information more easily. It derives from W3C director Sir Tim Berners-Lee’s vision of the Web as a universal medium for data, information, and knowledge exchange.
I’m sure Sir Tim didn’t pen the paragraph above, but this definition is a prime example of why we don’t let engineers be marketers. The definition is, in short, crap. Just ask average Joe to define “semantic” and see how long they stutter.
The most important aspect of the semantic web mentioned above is that web content can be read by two groups, both humans and computers. The human group is fairly adept at parsing unstructured information while the other group is literally dumber than a box of straw without careful instruction and relatively clean source data. When you walk into a store, you know to look for items on shelves with small tags below them denoting the purchase price in local currency. On the web a computer can’t see the shelves, items or prices. It all is a jumble of shapes and colors.
The source data is the main problem with current HTML. Originally a method for adding markup to text documents, current documents are a mess of information essentially projected onto the web as a virtual brochure for human eyes. Sometimes the prices of products are defined with preceding “$”, sometimes they aren’t. Looking at the source HTML for eBay versus Amazon will show you how impossible it would be to write rules to extract anything reliable and useful. What’s missing is structure and standards. Without structure our fast-but-dumb computers stare back blankly at us as if we could solve the problem faster without their help. Alas, all the web content is weird and formatted differently, what are we to do? If only we had some type of structured interface for the web which would separate the content out into recognizable chunks. These chunks could then be processed, consumed, (and most importantly), displayed for humans at some point down the road.
But wait, we do!
You may have recognized small, pretty orange-and-white logos popping up on sites over the past few years and these are exactly the type of structured data we’re thinking about. While they are aimed at delivering “news” content, thousands of websites are already delivering data through a structured interface. Each RSS reader only needs to implement a handful of processors to understand the information provided by many different sites.
RSS isn’t necessarily the delivery method for content in the future, but looking at the history of RSS adoption can shed some light on the adoption of the semantic web. Initially, RSS was plagued by multiple different definitions and the simple fact that nobody wants to write their content twice, once for the web and again for a feed reader. RSS only became more popular as it was added on as a default “option” in popular blogging software. Why write my content twice if my software creates the feed for “free”?
We are in the baby, baby steps of the internet where each site is essentially a one-off creation with different ideas about navigation, submission, shopping carts, searching and results. Why must I re-learn each website interaction model? Why don’t I have a toolbar built into my browser for each of these things so I could find them in a consistent and known way? Desktop applications embrace this to reduce cognitive load on the user, why can’t we do this on the web?
The real answer is that we can. Stay tuned over the next few days as explore various paths towards this structured future, leaving dirty words like “semantic” behind.