It's been a while since I posted about what my stealth venture, Radar Networks, is working on. Lately I've been seeing growing buzz in the industry around the "semantics" meme -- for example at the recent DEMO conference, several companies used the word "semantics" in their pitches. And of course there have been some fundings in this area in the last year, including Radar Networks and other companies.
Clearly the "semantic" sector is starting to heat up. As a result, I've been getting a lot of questions from reporters and VC's about how what we are doing compares to other companies such as for example, Powerset, Textdigger, and Metaweb. There was even a rumor that we had already closed our series B round! (That rumor is not true; in fact the round hasn't started yet, although I am getting very strong VC interest and we will start the round pretty soon).
In light of all this I thought it might be helpful to clarify what we are doing, how we understand what other leading players in this space are doing, and how we look at this sector.
Indexing the Decades of the Web
First of all, before we get started, there is one thing to clear up. The Semantic Web is part of what is being called "Web 3.0" by some, but it is in my opinion really just one of several converging technologies and trends that will define this coming era of the Web. I've written here about a proposed definition of Web 3.0, in more detail.
For those of you who don't like terms like Web 2.0, and Web 3.0, I also want to mention that I agree --- we all want to avoid a rapid series of such labels or an arms-race of companies claiming to be > x.0. So I have a practical proposal: Let's use these terms to index decades since the Web began. This is objective -- we can all agree on when decades begin and end, and if we look at history each decade is characterized by various trends.
I think this is reasonable proposal and actually useful (and also avoids endless new x.0's being announced every year). Web 1.0 was therefore the first decade of the Web: 1990 - 2000. Web 2.0 is the second decade, 2000 - 2010. Web 3.0 is the coming third decade, 2010 - 2020 and so on. Each of these decades is (or will be) characterized by particular technology movements, themes and trends, and these indices, 1.0, 2.0, etc. are just a convenient way of referencing them. This is a useful way to discuss history, and it's not without precedent. For example, various dynasties and historical periods are also given names and this provides shorthand way of referring to those periods and their unique flavors. To see my timeline of these decades, click here.
So with that said, what is Radar Networks actually working on? First of all, Radar Networks is still in stealth, although we are planning to go beta in 2007. Until we get closer to launch what I can say without an NDA is still limited. But at least I can give some helpful hints for those who are interested. This article provides some hints, as well as what I hope is a helpful tutorial about natural language search and the Semantic Web, and how they differ. I'll also discuss how Radar Networks compares some of the key startup ventures working with semantics in various ways today (there are many other companies in this sector -- if you know of any interesting ones, please let me know in the comments; I'm starting to compile a list).
(click the link below to keep reading the rest of this article...)
Semantic Social Software: The Semantic Web for Consumers
Here at Radar Networks, we are building a next-generation Web-based online service that will bring the Semantic Web to consumers and professionals across the Web. This application is focused on enabling the next generation of social software (note that social software is not necessarily social networking -- that is subset of social software). It is an example of what "the Intelligent Web" will be like. We are very excited about this service and what it already does, but there's still more to do before we release it.
Our app is based on the Semantic Web. It will enrich and facilitate more intelligent online relationships, community, content, collaboration and even commerce. It will help to bring the Semantic Web from research to reality by making it user-friendly, accessible and most of all, directly useful and valuable, to ordinary people. We are focused on providing value to consumers -- not just developers or early-adopters. But like I said, I can't really provide more details until we get closer to launch.
Our Web 3.0 Applications Platform
In order to build our product we had to first build a new platform to support the kinds of features and capabilities we designed -- we could not find any existing platform that could do what we wanted to do. Existing platforms for the Semantic Web were too research-oriented and did not provide the levels of scalability, performance and ease-of-use that we required.
We have been working on this platform over several years and several generations of our codebase. It is now very robust and sophisticated. We believe it is also significantly more scalable and performant than any platform we've seen in the Semantic Web space to-date.
Our platform is a comprehensive, Java-based framework for semantic web applications and services that has some similarities to Ruby on Rails (although it is also very different from RoR and we are not going after the platform market -- we're really more focused on our application right now). Our platform also includes a lot of other technology such as our extremely fast and scaleable storage layer for semantic data tuples, powerful semantic query capabilities, and a range of algorithms for analyzing data and doing intelligent things for users.
The platform could be called a "Web 3.0" applications platform because it is inherently based around RDF/OWL and the emerging Semantic Web. In addition to the "Web 3.0" aspects of what we are doing, our platform also makes heavy use of "Web 2.0" methods and technologies such as AJAX, REST, widgets, and RSS/ATOM, to name a few.
What We are Not Doing: Natural Language Search
First of all, we at Radar Networks are NOT building a new search engine to compete with Google, like Powerset and TextDigger are doing -- so we're not competing with them. Companies like Powerset and TextDigger are working on natural language search. Natural language search is not equivalent to the Semantic Web, although the Semantic Web can certainly help that process.
Companies working specifically on natural language search are making use of semantics, but at the word-level only. They use networks of words that are linked to synonyms, antonyms, homonyms and other variations. These are sometimes called semantic networks. Based on these networks of word meanings, they can understand the meaning of various words and expressions.
More sophisticated natural language search algorithms don't just look at the words alone, they look at them in context, by analyzing the grammar and the rest of the content around them. The point of natural language search is ultimately to try to match the meaning of words in search queries to the content of various documents -- and to do this better than Google, which basically just matches keywords without paying attention to the meaning of the words.
Essentially natural language search requires at least some level of artificial intelligence. Machine understanding of natural language is a difficult problem and there has been a lot of work on this over the last few decades. Today there are many technologies that focus on this but the majority of them are based on the assumption that software should do all the work to figure out the meaning of information.
What We Are Doing: Semantic Web
In contrast to natural language search which focuses on trying to derive the meanings of words, the approach of the emerging Semantic Web makes use of metadata to encode the meaning of information.
In this approach, the meaning of the information can be explicitly coded into the information just as HTML codes are added into content today -- and this can be done by people or software, and even by communities. Once this meaning -- or semantics -- is explicitly encoded into content, it can then be re-used by other applications to make sense of the content. It's worth noting that explicit semantics in content can also help natural language processing apps, as well as apps that don't understand natural language.
In the Semantic Web approach, the meaning of the information is encoded using markup languages such as RDF and OWL, which are W3C open standards. Words and concepts in the content of documents and data records can be marked up with RDF/OWL expressions to indicate what they mean -- does a certain word or phrase such as "Lotus" for example, mean a software company, a software product, an exotic sportscar brand, or some other kind of concept? Without sophisticated natural language processing it is often difficult for software to determine this on its own. The Semantic Web provides markup codes that explicitly indicate the intended meaning of information in an unambiguous, machine-readable format.
Marking up content with additional metadata was possible before the Semantic Web using XML: you could just say <sportscar>Lotus</sportscar> but the problem is that the meaning of "sportscar" still had to be coded into applications in order for them know what it implies. With RDF/OWL that meaning can be formally encoded outside of applications in a set of definitions called an ontology. An ontology defines facts such as "a sportscar is a kind of car," "a car is a ground vehicle," "a car is a product," "a car is a device," "a sportscar is a recreational or competitive vehicle," etc.
By marking up content with OWL indicating that it is a sportscar, that meaning refers to the appropriate definitions in an ontology, from which any application that can read the ontology can then then infer these various specific intended meanings. The point here is that semantics are less ambiguous -- they are explicitly encoded by the ontology which functions as a kind of more advanced data schema of sorts.
But this is really an oversimplification -- OWL and ontologies can actually go a lot further than just defining the meaning of concepts -- they can also define their logical relations as well. For example, how exactly are two things connected and are there any special restrictions on that connection? For instance, an ontology can define that a person's sister must be female, or that a person can only have 1 biological mother, etc.
All kinds of apps can benefit from the extra hints about the meaning of the information that can be provided by Semantic Web metadata around content. For example, even a natural language search engine could do less analysis and would need less intelligence, if it could leverage existing semantic metadata that was already in content.
It's important to note that applications and people don't have to necessarily ever look at RDF or OWL code (thank heavens!) -- they can just work with objects and forms like they already do on the Web and the underlying markup can be created automatically for them. Nobody should have to look at raw RDF and OWL (unless they really want to), and the Semantic Web doesn't force anyone to. For example, most of us don't write HTML or XML or CSS by hand -- but if we are using blogs or wikis or even posting listing on sites like job boards and auctions, we we are doing things that result in HTML, XML and CSS being created.
It should be clear from the above section that natural language search is a specific process that makes use of word-level semantics, but the Semantic Web is a broad set of technologies for defining the meaning of any kind of information (including, but not limited to words). The Semantic Web can help improve the process of natural language search, but today many natural language search algorithms do not make use of the Semantic Web or RDF/OWL data structures. However, as these technologies begin to converge (as they are here at Radar Networks, in fact) we will see new levels of accuracy become possible -- the combination of traditional natural language processing and the richer semantics of RDF/OWL markup enables even more powerful machine-understanding and processing of text. That said, once again, I want to be clear that Radar Networks is not a search company -- although we do use next-generation semantic search quite extensively in our application and platform.
Any application that can understand RDF/OWL can correctly interpret the meaning of any content that is marked up with RDF/OWL metadata. If a news article that mentions "Paris" many times is marked up with RDF/OWL metadata then any app that can understand that metadata can for example, correctly determine that the article is about the place Paris, Texas, not the place Paris, France, and not the person Paris Hilton either. The application doesn't have to do any fancy natural language processing to know this. Even a relatively "dumb" application that has no ability to do natural language processing can still make sense of a document if it can at least understand RDF/OWL.
So how does this explicit semantic markup in the form of RDF/OWL metadata get into the document in the first place? Well it could have been added automatically by some other software app that did natural language processing on it, or it could have resulted from newspaper editors and/or even readers categorizing and/or tagging the document with tags for places, people, etc. in a manner not unlike how they tag content in services like Flickr today.
The main point here is that adding the semantic metadata does not require
the apps that create or consume consume the content to understand natural language, nor does it require people to be XML coders -- even regular end-users can help to define
the semantics of content by simply tagging it. The Semantic Web provides a much
richer and more expressive framework for doing this than is currently available in Web 2.0 "tags," but it's not that far off either.
The Semantic Web can enhance word-level understanding and processing of text in many ways, but note that it is not limited only to word-level applications. The Semantic Web provides a way to make any information more understandable to other applications -- including data records in databases, documents on the desktop and the Web, enterprise data, photos, videos, music, and even Web services and software code.
Simple Examples of Semantics
For example, today there is a big problem in integrating data across applications. In the enterprise for example, one application might define a record called a "Customer" while another might call that concept by the term "Client." If a user then searches for "Customers" they won't necessarily also find records for Clients. But using the Semantic Web the data records for Customers and Clients can be mapped together so that applications can treat them as equivalent. Any search for one will return the other as well. Not only can records be mapped to each other, but also the fields of those records can be mapped together. For instance, the Customer record might have a field named "Referred by" while the Client record might have a field called "Introduced by" -- these can be mapped together as well.
A similar example could apply to a consumer use-case -- for example shopping: different stores describe the same product differently -- with different terms. In one store a laptop is called a "laptop computer" and in another it is called a "portable computer," while another calls it a "desktop replacement." A search for any of these terms should return products that use any of these. Within a single commerce site this is not so hard, but what about searching across many commerce sites (which isn't really even that easy to do at all today...)? If different commerce sites used the same underlying semantic metadata definitions to markup their various products, then users could search across their products with less trial-and-error, and they would get better results.
Of course the technology for mapping between databases is not new -- there are many ways to do this -- but the Semantic Web provides a way to do it that may be more open and efficient in the long-run. Central to this approach is that an organization or online service can use ontologies that centrally define key concepts in a rigorous way. So instead of every different app and data record having to be individually mapped to every other, they can potentially all just map to the central ontology which functions as a kind of semantic switchboard of sorts. All applications and queries can use a common ontology (or set of them) to unify access to data records across many different online services and databases. In a sense ontologies provide a way to define and share common languages for data, content, relationships and applications.
SPARQL and the Emerging Data-Web
More recently a new Semantic Web technology called SPARQL has also started to emerge. SPARQL provides a common query language, like SQL, for querying data that is stored in RDF. Any site or database that has RDF data and that provides a SPARQL interface can be searched by any application that speaks SPARQL. This means that the dream of "deep web search" is finally going to become a reality. There is a huge amount of interest in SPARQL at the moment and there are already a growing number of SPARQL endpoints popping up around the Web. These new SPARQL endpoints are to data what websites were to documents. It's the beginning of what some call "The Data Web" -- which is the first step to the full-blown Semantic Web. SPARQL is also a big piece of what we are doing.
Reasoning: The Next Frontier After Search
Another key benefit of using RDF/OWL is that these languages are designed to support formal logical reasoning. By marking up information with RDF/OWL sophisticated search and inferencing can then take place around it. For example, by marking up various people and their social connections it is then possible to infer for example, that Sue is Jane's cousin, that Bob and Dave are colleagues, and that product A is incompatible with product B, etc.
This kind of logical reasoning and inference is essential to enable the next-generation of the Web -- an Intelligent Web -- where software and online services start to help people work, communicate, socialize and shop more productively. For example it will enable something beyond search -- it will enable services that provide answers or suggestions. This is not necessarily important for all applications today, but it will become increasingly important in the future. Content that exists in RDF/OWL essentially has a longer shelf-life and will be easier to reuse, integrate and reason across in the future.
Differentiating The Players
The Semantic Web provides a comprehensive and growing framework of technologies that enable the next evolution of the Web -- it is therefore a much broader and farther-reaching vision than natural language search, even though that is certainly one area that it will benefit. Natural language search is really just about matching search queries to documents, by analyzing the meaning of words. The Semantic Web is about defining the meaning of data -- any data -- words, data records, documents, social relationships, product listings, etc. -- and providing a way to query that data, integrate it, and reason across it.
In our own application and platform we make use of a lot of natural-language processing (NLP) and we also provide semantic search capabilities, but our focus is on something quite different than searching the Web -- yet equally useful and important to everyone. Frankly, I'm glad we are not working on search, as big an opportunity as that is -- I think competing directly with Google is a daunting task and not one I would want to be on! Instead, we are providing a new environment in which people can start to benefit from the power of the Semantic Web in areas that Google is very weak in today or is not in at all in some cases; it's really quite orthogonal to Google and other search engines.
So from the above conversation it should be clear that we are working on The Semantic Web, not just natural language search and so we are quite different from companies like Powerset, Textdigger and others who are working on word-level semantic understanding of text. But what about Metaweb -- how do we differ from them? -- Well from what we can glean so far, what we are doing is also very different from them as well but perhaps not as different as we are from Powerset.
Radar Networks and Metaweb are frequently cited as the two main startups working to bring semantically-driven Web 3.0 online services to consumers. My guess is that there will be some similarities but even more differences. There may even be opportunities for us to work together someday. But we're all still in stealth, so it's hard to get very specific about our similarities and differences today. One thing is for sure, 2007 is going to be an exciting year for both our companies, and for the emerging Web 3.0 generation of companies and products.
Web 3.0 is just beginning
In any case the next-evolution of the Web -- what we call "The Intelligent Web" (and what many are also calling "Web 3.0") is still in the very early stages and I don't think it will really hit big until 2010 (for a graphical timeline of how I think this will unfold, click here). In the meantime we are all putting the pieces in place.
Fortunately Web 3.0 is a big space with a lot of opportunity and there is room for a many different players and business models to co-exist and compete. The fact that there are now several ventures in this space is a good thing for all of us, for as one person said to me the other day, "a rising tide lifts all boats." I'm happy that there is enough action for there to actually be some confusion for me to clear up! Only a year ago it felt like we were the only commercial voice a wilderness of academic research. Today VC's are lining up to speak to us and the other companies in the space, and we are literally having to keep them at bay until we start our B round.
Solving Information Overload
The key realization behind all this recent interest in semantics is that keyword search and traditional content and data representations are declining in productivity. As the Web gets vaster and more complex, and as consumers must work with a growing array of content and services, productivity is seriously being threatened -- not only in search, but also in every other area of our digital lives. Most of us who work intensively with knowledge and information already have a direct and intuitive experience of how information overload has grown, even in the last decade. Clearly something must be done about this or in another few years we will all be buried in our own information.
The Semantic Web provides the best (and really the only) long-term solution to information overload and complexity. By starting to add richer semantics to data, and by enabling applications to start leveraging this, it will make it possible to help people regain more of their productivity and to make software smarter -- without having to attempt to create super-duper science fiction artificial intelligence.
It's very important to keep in mind that The Semantic Web does not require that machines understand or reason as well as people -- the semantics of the Semantic Web can be created by people and/or machines, and it doesn't have to be perfect, it simply has to add hints that make content less ambiguous and more structured. By contrast, both the keyword approach of Google and the natural language search approach of companies like Powerset -- if they are to keep up with the growing complexity of the Web -- will require increasingly intelligent software, because basically in such systems the software has to do all the work by itself.
The Semantic Web actually is really more about leveraging the
collective intelligence of people and applications to enrich content -- rather than trying
to make applications do all the work on their own -- but this will be a lot clearer later in the process when there are several Semantic Web apps that demonstrate this.
Here at Radar Networks we have been working towards this vision steadily -- and we're proud of the fact that we started working with semantics long before it was "cool" -- we know this space inside out and we think that our first application on our platform will be an "Aha experience" for users.
It certainly has taken some time to bring the Semantic Web to fruition, but when you think about it, Web 1.0 took about 5 years to really get started, so it's not without precedent. A new generation of the Web is a big undertaking. For now, all of us working on anything having to do with "semantics" or Web 3.0 need to work together to start mapping out this space and educating the marketplace so that people (including the press and VC's, and early-adopters) can understand the companies and technologies more clearly. The rather humorous irony for all of us, is that the meaning of the term "semantic" is still so ambiguous today!