I believe the next big leap for the Web is what I am calling "The World Wide Database." The World Wide Database is a globally distributed network of data records that reside on millions of nodes around the network which collectively behaves as a giant virtual, decentralized database system. Google Base is an attempt to try to build such a database on a single node. But I don't think that approach will ultimately become the WWDB. At best it will be a huge data silo, or many silos in one place.
I think that for the WWDB to emerge it has to be distributed, just like the Web itself. Think about it. Would the Web have spread as it did in 1995-1996 if all Web sites had to be hosted on Yahoo? I don't think so. Not only would such a restriction have stifled innovation and competition, it simply would not have scaled. There is no way that today's Web could live on a single node! This means that Google Base -- whatever it intends to be -- is not a candidate for becoming the WWDB -- at least not if Google intends to host the whole thing. Ultimately what we are really going to need is a system that enables anyone to run their own node in the WWDB as easily as they can run their own Web server today.
I think there are several steps necessary to evolving the WWDB:
- Level 1: The Document Web. This is sometimes called "Web 1.0." It is a Web of HTML formatted documents connected by hyperlinks. We have this already. The content on the Document Web is unstructured or semi-structured. It is mostly flat text and images.
- Level 2: The Data Web. This is a Web of structured data, defined and expressed in XML. XML does for content structure what HTML does for content formatting. OK, for the purists out there this analogy is simplistic, but I still think it's useful. The content on the Data Web is mostly structured data records of one form or another. The Data Web is one component of "Web 2.0" but not all of the story (Web 2.0 also includes other technologies and methods besides just XML). The Data Web makes it possible to publish and consume data on the Web, but it doesn't solve the problem of data interoperability. The data created on the Data Web is largely non-interoperable. Applications must be explicitly coded to work with each data schema.
- Level 3: The Semantic Web. The Semantic Web -- what we might call "Web 3.0" -- takes the Data Web one step further by providing formal languages (RDF and OWL) for defining the semantics of data structures, mapping between them, publishing data records, and searching across them (using SPARQL, a new query language). The Semantic Web solves the problem of data interoperability by providing open standards for defining and integrating data schemas using formal ontologies. Ontologies may be used to define top-level schemas, and/or to map between lower-level schemas, making it possible to integrate data schemas at a meta-level.
- Level 4: The World Wide Database. This is when it all comes together. The Semantic Web combined with the Data Web and the Document Web enables the Web to function as a vast, decentralized database. A core set of upper and mid-level ontologies define common concepts, data types and relationships. These ontologies in turn are used to map between thousands of lower level domain ontologies about specific subject areas. On the basis of this ontological fabric, all data is integrated and accessible. Applications can add records to this database at any node on the Web, it has no center. Agents roam autonomously within it, discovering knowledge, adding content, and making inferences and links. Search engines syndicate distributed queries across millions of nodes in order to scan billions of data records at once. Within this network, services aggregate, remix, and organize subsets of the data into virtual databases about various subjects such that the same data records can be referenced in multiple different applications and contexts.
The WWDB cannot function with the Data Web alone -- it requires the Semantic Web. Without the Semantic Web, the data on the Data Web is still siloed -- it cannot behave as a single database. By adding the Semantic Web layer to the Data Web we can dissolve these silos, making data and applications more interoperable. Only once this happens will it be possible to treat the entire Web as a single virtual database. Until we have the Semantic Web, the Data Web will continue to be a complex system of thousands or millions of databases at best.
I think Google Base is an attempt to create a large, centralized Data Web -- But even within Google Base itself, I see huge potential data interoperability obstacles and I wonder whether it will behave as a single database or millions of little database silos that don't work together. It doesn't seem to be a candidate for becoming the WWDB. But who knows, maybe Google will gradually embrace semantics over time (their statements in the past have been very opposed to the Semantic Web however).
For the WWDB to emerge, we need a more decentralized approach, and we probably also need a new kind of server for hosting WWDB nodes. In addition, we probably will also need a core ontology or set of core ontologies that everyone can start using for high-level data interoperability. It's very difficult (probably impossible, in fact) to come up with one ontology that covers everyone's perspectives and needs -- But I think we can do a pretty good job of coming up with a simple ontology that covers common concepts -- if we carefully restrict the domain and purpose of this ontology. According to my own research there are really only a few core concepts that we all need to share in order to achieve very high degrees of data interoperability for most of our data. Once we agree on these, branch ontologies can be developed by special interest groups for particular vertical domains of data, and mappings can be made from these to the common upper and middle ontology layers, as well as laterally to other alternative mappings within their own domains, and other vertical ontologies in other related domains. This is a fair amount of work and won't happen overnight. I think it will take place in both a top-down and bottom-up manner simultaneously. Gradually, islands will emerge and form bridges to one another. Meanwhile, here at Radar Networks we are working on this problem from several angles and hopefully in the future we will be able to make a useful contribution to the evolution of the WWDB.