233 posts categorized "Web/Tech"

May 13, 2008

Associative Search and the Semantic Web: The Next Step Beyond Natural Language Search

Our present day search engines are a poor match for the way that our brains actually think and search for answers. Our brains search associatively along networks of relationships. We search for things that are related to things we know, and things that are related to those things. Our brains not only search along these networks, they sense when networks intersect, and that is how we find things.

In fact our memory often works by "homing in" on what we are looking for, rather than finding exact matches. Keyword searching provides a very weak form of "homing in" -- by choosing our keywords carefully we can limit the set of things which match. But the problem is we can only find things which contain those literal keywords. But our brains on the other hand use a much more sophisticated form of "homing in" on answers. Instead of literal matches, our brains look for things things which are associatively connected to things we remember, in order to find what we are ultimately looking for.

For example, consider the case where you cannot remember someone's name. How do you remember it? Usually we start by trying to remember various facts about that person. By doing this our brains then start networking from those facts to other facts and finally to other memories that they intersect.  Ultimately through this process of "free association" or "associative memory" we home in on things which eventually trigger a memory of the person's name.

Keyword search is a very weak approximation of associative search because there really is no concept of a relationship at all. By entering keywords into a search engine like Google we are simulating an associative search, but without the real power of actual relationships between things to help us. Google does not know how various concepts are related and it doesn't take that into account when helping us find things. Instead, Google just looks for documents that contain exact matches to the terms we are looking for and weights them statistically. It makes some use of relationships between Web pages to rank the results, but it does not actually search along relationships to find new results.

Google does not work the way our brains think. This difference creates an inefficiency for searchers: We have to do the work of translating our associative way of thinking into "keywordese" that is likely to return results we want. Often this requires a bit of trial and error and reiteration of our searches before we get result sets that match our needs.

Natural language search engines are slightly closer because they at least attempt to understand the meaning of a query and the meaning of result documents in order to make a better match between the question and potential answers. But this is still not true associative search.

A natural language search can understand the meaning of a query like "books about Harry Potter" and it knows this is not the same as "Books by Harry Potter." But ultimately what is happening is that a linguistic expression is being converted into a more sophisticated keyword search. The language in the query is being mapped to documents that contain text that answers a question, or to data objects that match the thing being asked for. This is certainly better than keyword search but it is still a form of literal matching. It is not really making use of associative search along relationships in the data (other than linguistic relationships between words in the query) or any sort of sophisticated reasoning.

Associative search doesn't merely understand the meaning of the query, it understands and can reason about relationships in the data. This is an important distinction. An associative search returns documents that represent things that related via various forms of associations (semantic links) to the things in the query. An associative search looks through a network of associations for the things that are most connected to the items in the query. By specifying more specific starting points, the set of things which are connected to all those starting points is narrowed.

Associative search is a very different approach to search from keyword search (which merely looks for things with the keywords in them) and natural language search (which merely looks for things that contain content that matches the meaning of the question). It also happens to be more similar to how our brains actually think.

Associative search is the basis for reasoning, but in its simplest form it does not require reasoning. In its simplest form it is just applying statistics to networks of relationships to narrow down on things which are highly related to items in a query. By adding reasoning to the mix it becomes vastly more powerful however. Reasoning adds the ability to generalize or get more specific, and to weight various paths through the network of relationships in more sophisticated ways.

Our current search tools -- whether they are keyword based or natural language based do not support true associative search. But we do see associative search starting to appear in a very different breed of application: social networks. A search in LinkedIn for example, is an associative search.

As we begin to merge our social networks with our search engines we may start to see associative search engines appearing on the Web. In fact, I would venture that this is how Facebook could give Google some serious competition. But they have to hurry if they are going to do this -- Google has clearly realized the power of "social search" and is rapidly moving to leverage it in their own search results.

Ultimately associative search is more than just social search however. To be really effective, associative search engines need to understand and leverage the full spectrum of relationships between things, not just social relationships. In order to accomplish this, associative search engines need the Semantic Web. They need to see and understand more types of relationships between more types of things.

With that in mind, here is an example of how Semantic Web enabled associative search will work in the future.

PROBLEM: I am trying to remember name of the organizer of a conference I once attended.

WHAT I ALREADY  KNOW:

  • I know this person and have corresponded with them in the past.
  • The conference was related to government and the Internet.
  • It took place in a town near Big Sur, but I can't remember the name of the town.
  • The organizer of the conference once introduced me to a male celebrity, but I can't remember the celebrity's name.
  • I gave a talk at the Conference about Web 3.0.
  • My friend, Sue Smith, also spoke at the conference.
  • The conference I attended took place in the Spring, but I am not sure if it was last year or two years ago.

In the above example, I cannot remember the specific keywords that will help me generate a query to find the answer. Instead, I remember a number of relationships and generalizations about the answer. Present day search engines cannot see these relationships, and they have no ability to understand a generalization and look at things it contains.

The ability to intersect the sets formed by relationships and generalizations is a fundamental feature of human memory and search. But our present day tools don't have these capabilities. Thus we have to spend time translating our questions into keywordese, rather than just asking our questions in the actual language of human thought.

There are two ways to approach solving this.

  1. The first way is to create artificial intelligence which, given a question in natural language English, can understand it and reason about the question as well as understand and reason about the information in the set of documents being searched, in order to intelligently arrive at candidate answers. This is computationally intensive, and very hard to program. This is why AI hasn't quite happened yet on this scale.
  2. A perhaps easier approach is to use the Semantic Web. In the Semantic Web approach, metadata is embedded into content that describes the meaning of the content, it's various important properties, and its relationships to other concepts. On the basis of this metadata, the problem becomes much simpler to solve. Instead of doing high-level AI it becomes essentially a statistical search.

Now let's look at how using the Semantic Web could help us solve the above problem via an associative search:

Items are connected to more general or specific concepts by virtue of semantic linkages between concepts. For example, the conference I am looking for is related to the concepts "Government" and "Technology." If I can at least remember that then I can find conferences related to government and technology. Furthermore, since the concept "Policy" is a subset of government it may be related to that topic as well.

Likewise, things are connected to things that are "near" them via geographic links. Because the conference was near Big Sur it is in Northern California, along the coast. It is probably in a town that is geographically close, ror example Carmel-by-the-Sea is a town that is near the Big Sur area.

The organizer of the conference introduced me to a male celebrity. There are several celebrities in my social network. If the fact that I met certain people via introductions from other people was stored using semantic links, then this too would be searchable. For example, "find all celebrities I was introduced to by my connections" would be a solvable query. Similarly, "find people who introduced me to celebrities" would also be solvable.

The fact that I gave a talk at the conference could also be semantically represented on a data record describing the conference, as well as on my own profile. Thus there could exist a link such as "speaker at" which links me to various conferences I have spoken at. I could then get a list of all the conference I have spoken at. I could also look for all the conferences where both myself and Sue Smith were speakers.

Or, better yet, there could be a link called "Gave talk about" which links me to an instance describing each talk I have given. From such an instance there could then be "Gave talk at" links to all the events where I have given that talk. So I could look up my "Web 3.0" talk and then see all the conferences where I gave that talk.

Temporal relations can also be generalized and semantically represented. For example, the conference I am looking for took place in the spring. Therefore only look for conferences that took place in or near months that are considered to be in the spring season.

By intersecting the results of the above searches we narrow down very precisely to a set of people I might be looking for, or just to a single qualifying person.

For example the answer I was seeking for was that the organizer was named Robert Jones, and the conference was about Government and Technology Policy in Carmel-by-the-Sea last spring. This result should be easily findable via associative search starting from the above set of things I remember.

But if for some reason the answer is still not there, there is another capability which the brain uses that we need to add to our search engines: Perturbation, or what could be called "prospecting."

The query I entered is comprised of a question and a set of facts related to the answer I am seeking. But there is a possiblity that I asked the question incorrectly, or some of the facts I added were incorrect, or insufficient. Perturbation can correct for this by introducing variations into the question and the facts in order to explore the space of answers that are "near" them as well.

There are many ways to go about adding perturbation to the system -- for example, we can search more than one hop out from every link, or we can search for other types of relationships that are highly correlated with relationships we are asking for explicitly, or we can include results for things which are strongly connected to things that are found.

From a user-interface standpoint perturbation can be controlled with a simple "sliding lever" in the user interface for "Precision." If the user sets very high Precision as a requirement then there is no perturbation -- the results are exact matches to the query and facts. If there is low Precision as a requirement then there can be more perturbation, thus the results are fuzzy and may include things that are near what I asked for but not exactly what I specified, enabling me to discover things via relevant relationships that I could not even remember to mention as facts.

Finally, using a reasoner, the results found by the above search can be analyzed such that those results which are most likely to be what I am looking for, given the facts I have included as constraints, are presented first. Reasoning becomes the ranking algorithm in the system, rather than something like Pagerank. The answers that actually make the most sense in the context of my question are delivered first.

The above illustration describes how searches that are powered by the Semantic Web will work, once this technology is widely adopted. This is how the brain works, and how our search engines should work as well. 

This is not a pipedream -- in fact it is already happening in research settings and in the government. Within 15 years, if not a lot sooner, we will see these capabilities emerge in consumer-grade search interfaces.

Continue reading "Associative Search and the Semantic Web: The Next Step Beyond Natural Language Search" »

April 23, 2008

Great Collective Intelligence Book; Includes a Chapter I Wrote

I highly recommend this new book on Collective Intelligence. It features chapters by a Who's Who of thinkers on Collective Intelligence, including a chapter by me about "Harnessing the Collective Intelligence of the World Wide Web."

Here is the full-text of my chapter, minus illustrations (the rest of the book is great and I suggest you buy it to have on your shelf. It's a big volume and worth the read):

 

Harnessing the collective intelligence

of the World-Wide Web

 

Nova Spivack[1]

 

Introduction

We are about to enter the third decade of the Web, sometimes referred to as “Web 3.0.” During this decade, the Web will evolve from a globally distributed fileserver into a globally distributed database. This shift will be enabled by a set of emerging technologies called The Semantic Web, which add a new layer of machine-understandable metadata about the meaning of information to the content of the Web.

The Semantic Web will catalyze a new era in collective intelligence. Individuals, groups, organizations and communities will be able to create, connect, find and share knowledge more intelligently and productively than ever before. Ultimately it will enable the Web itself, and all the people and applications that participate in it, to become more collectively intelligent.

Web 3.0—The Third Decade of the Web


The third-decade of the Web, “Web 3.0,” begins officially in 2010, but we are already entering the early stages of this transition today. To understand where the Web is headed it helps to zoom out to a larger historical context.

 The final decade of the PC-era (1980—1990) was largely concerned with innovation on the front-end of the personal computer: the desktop and user interface layer of the PC. The focus of this period was in making PC’s easier to use with innovations such as Microsoft Windows, the Macintosh user-interface, and more consistent user-interfaces and integration across applications.

 The first decade of the Web-era (“Web 1.0” from 1990 - 2000), was focused on the back-end of the Web: the core technologies and platforms of the Web such as HTML, HTTP, Web servers, search engines, commerce technologies, advertising technologies, and the basic architectures and business model of Web applications. This decade was mainly focused on the technology and infrastructure of the Web and most of the actual innovation dollars were spent on making things that only software developers could see.

 In contrast, the second decade of the Web (“Web 2.0” from 2000—2010) has been largely focused on the front-end of the Web. Much of the innovation has not been on actual technology but rather on design patterns and user-interfaces for improving the end-user experience of the Web. During this decade we have focused on paradigms such as

AJAX

, which is a set of technologies and design methodologies for making Web sites more visually appealing and interactive.

 Another big focus of Web 2.0 has been user-generated content, and in particular the practice of “tagging” content with subject tags. Tagging has in turn led to the concept of “folksonomies” in which taxonomies that organize data are evolved in a bottom-up fashion by a decentralized community of users.

 The coming third-decade of the Web (“Web 3.0” from 2010—2020) will shift the emphasis back to the back-end of the Web. This decade will be largely focused on upgrading the technical infrastructure and content of the Web, based on emerging technologies such as the Semantic Web. During this decade the primary push will be enriching the Web so that it can function more like a database.

 Today the Web is composed mainly of unstructured and semistructured data such as text files and Web pages. Keyword search engines are able to provide rudimentary search capabilities over this information, but only for the most simplistic queries. Compare current Web search to the more precise capabilities of queries against a database and the difference is immediately clear. The Web does not provide anything close to the search capabilities or precision of a database today. But that is about to change.

 The Semantic Web provides a way to enrich both unstructured and structured data so that it can be queried with the precision of a database. Essentially, it provides a way to tag any information with metadata that explains what it means—and this metadata can be understood by software applications, such as search engines or knowledge management applications. It’s important to note that The Semantic Web is not a new Web, it’s just a new layer of the Web we already have. The semantic metadata that comprises the knowledge of the Semantic Web won’t live in some new place—it lives right in the existing documents and data on the Web. The knowledge of the Semantic Web is encoded using special new markup languages such as RDF and OWL.

 This metadata is invisible to users (it doesn’t appear in Web browsers) but behind the scenes it can be read by any application that is compatible with these markup languages. So when any application, such as a next-generation search engine, sees a Web page or data record that contains RDF or OWL metadata, it can then use that metadata to understand what that page or data record means, is about, what it is related to, and how to interpret it. With Semantic Web metadata in place, searches on the Web will be as, or even more, precise as those in any database. But that is just the beginning of what the Semantic Web enables. Beyond merely improving search, the Semantic Web actually transforms the Web into a database—a worldwide database in which data records can be moved around, shared, and linked together in new ways.

 On the basis of the technologies of The Semantic Web and the Web 3.0 era, we will then be able to enter the fourth decade of the Web (“Web 4.0”—2020—2030) in which the shift will turn back to the front-end of the Web. The Semantic Web doesn’t just add metadata about the meaning of information to the Web, it also enables metadata to be added about relationships, conceptual linkages, logical connections, and even logical rules. On the basis of this additional metadata, Web users and other applications will be able to harness the power of intelligent agents that will search the Web for things that interest them, make suggestions and recommendations, and even potentially transact on their behalf. This will open the door to a new kind of user-interface to the Web that is smarter and more conversational in nature, in which users will enter into dialogues with agents and interact with them search the Web and make decisions. A conversational interface to the Web will be more appropriate in the increasingly mobile world, when users will mostly interact with the Web from small portable mobile or embedded devices.

 Users on mobile devices that have little to no screen real-estate will need a more productive way to interact with the Web than through a miniature browser; nobody likes sorting through pages of Google results on a cell phone. Instead, they will want to simply ask a question (perhaps through a voice interface, rather than typing with their thumbs) and have a virtual intelligent assistant dispatch agents to find the best answers and then report back to them with results or to ask further questions or for a decision.

 Smart, interactive conversational interfaces and intelligent agent-based virtual assistants are possible today, but only in narrow domains. In the Web 4.0 era they may in fact be our primary way of interacting with the whole Web and may be built into the user interface of most search engines, personal email providers, and leading Websites.

The Virtualization of Knowledge and Intelligence

In the long-term, the Semantic Web provides a way to move much of the “intelligence” that currently resides in the minds of individuals, groups and organizations, and/or that is hard-coded into various software and Web applications, out onto the Web itself. It provides a way to virtualize knowledge and intelligence in an explicitly machine-readable, universally accessible form. In other words, it provides a way to start making the Web “smarter.”

 Knowledge and expertise that previously only existed in people’s heads, or had to be painstakingly coded into each particular vertical software application, will be represented in a form of universally readable metadata on the Web—just like HTML documents today. In other words, using the Semantic Web you can publish knowledge and even the underlying conceptual frameworks, rules and heuristics that embody domain expertise, on the Web in an abstract, machine-readable form.

 There are many benefits that stem from this. For one thing, it will make it much easier to write smart software applications because much of the necessary “smarts” will not reside in the applications at all, but will rather live out there on the Web.

 For example, to write an application that can intelligently assist with travel logistics, a developer will simply be able to point it at existing sets of knowledge and rules that exist for the travel domain on the Web already. The application will be able to draw on those pools of existing domain-knowledge without having to be specifically programmed to do so, because it understands the underlying standards of the Semantic Web. Similarly, the same application could just as easily help someone trade on the stock market, by simply pointing to domain knowledge on Semantic Web about finance and investment.

 As more pools of domain knowledge are added to the Web around various verticals, all applications will potentially benefit. This sets up a kind of network effect in which a global knowledge commons begins to form and self-amplify over time. For example, first the travel domain is added to the Semantic Web. Then someone else adds domain knowledge about geography and links them together. Another group then adds domain knowledge about hotels, and another one adds domain knowledge about weather—and these all connect to each other in various ways.

 With all of this interconnected knowledge on the Web in machine-readable form, application developers can then more easily and quickly write applications that understand concepts and rules related to booking travel reservations, and that can cross-reference reservation information with knowledge about geographic places, relevant weather, and hotels in those locations. And in the other direction, someone booking a hotel can then find information about relevant weather and book travel to get to that hotel. This is just one example. There are an infinite range of other possibilities for these technologies across all domains.

 The key point of all this is that The Semantic Web enables applications to become thinner, yet at the same time smarter, by drawing on the collective intelligence embodied by the Web itself. It will become possible to write applications that understand one or more specialized vertical domains faster, and ultimately applications will become more general—they will be able to dynamically load in specialized domain knowledge for whatever domain is needed, without having to be specifically programmed or limited to just those domains.

 Application developers will be able to draw on the knowledge added to the Web by others, instead of having to reinvent the wheel by programming all that knowledge directly into their applications every time. And in turn, the knowledge that their applications create can, if they want to allow it, be published back onto the Web for other applications to draw on as well.

Semantic Web as The Next Leap in Human Collective Intelligence

Looking at the evolution of the Semantic Web in historical context, we can view it as the next big step in a longer process of the evolution of human collective intelligence.

 Before the invention of written language, knowledge could only be communicated verbally and was handed down through oral traditions. During this period, one had to be in immediate physical proximity of someone who had certain knowledge in order to receive it from them. This meant that the maximum effective range of human collective intelligence was quite short in space and time.

 With the invention of writing, and eventually printing, humanity was able to process knowledge over longer distances in space and time, and with less reliance on particular individuals. People could now engage in dialogues and dialectics with larger groups of people in more places, across larger distances in space, and with more precision over larger ranges of time.

 The printing press took this to a new level by starting the process of mass-distribution of knowledge, but it still relied on an expensive physical manufacturing process and a paper medium that was perishable and costly to store and move around.

 With advent of electronic communications of various forms, humanity achieved many milestones—the transmission of knowledge could take place at the speed of light, and using digital storage media we were freed from the limitations of the paper medium.

 The Internet and the Web transformed the process of distributing knowledge even further—enabling a global knowledge commons to emerge. The Internet and Web enable anyone and everyone to become providers of knowledge, not just consumers—a fundamental shift in the way that knowledge transmission and media function. They are not just about the mass-distribution and mass-consumption of knowledge; they enable the mass-creation of knowledge. In some respects these technologies are analogues of the printing press in that they have democratized the process of creating, sharing and accessing knowledge by fundamentally changing the economics of the entire process—making it affordable and accessible to all.

 But even on the Web, for all its many benefits, knowledge is still not free from the limitations of the human brain. Only humans can really understand the knowledge that is represented in Web sites and databases, for example. While all other processes related to the distribution, storage and access to knowledge can now be done digitally, using software and the Web, the processes of creating, consuming and actually understanding knowledge are still limited only to living humans. That’s where the Semantic Web comes in.

Liberating Knowledge and Intelligence from Human Brains

The Semantic Web virtualizes human knowledge and expertise outside of human brains, and even outside of any particular software application—knowledge becomes essentially just more data on the Web. When we speak of knowledge here we don’t just mean information—the first-order raw data that is currently on the Web—we mean the actual meaning and interpretation of the information that is not on the Web but rather exists only in human brains.

 The Semantic Web provides a way to make the meaning and interpretation of information explicit in a form that is unambiguous and publishable, and shareable, on the Web. This will make all this knowledge understandable by software. It’s almost like the invention of a new language—a sort of meta-language for formally expressing what exactly you mean when you say something. The impact of this could be enormous.

 For the first time in human history, we won’t have to rely only on humans to create, understand and consume knowledge—our machines will be able to help us do this. They will help us work, collaborate, create, explore, monitor, discover, search, innovate, connect, and synthesize. This will open the door to an almost unimaginable amplification of the human mind, and human collective intelligence on this planet. At first the impact of this will largely be focused around assisting humans with simple clerical and research tasks, but the process will inevitably continue to evolve to a point where software will begin to originate new knowledge for us, advise us, and eventually to even start making certain types of decisions on our behalf.

 Although the Semantic Web has barely moved from the lab to the mainstream Internet, it is in fact much farther along than most people realize. Today there are already semantic applications under development that can organize all your information automatically, make recommendations based on your dynamically changing interests, identify new connections between ideas or documents in different places, make logical inferences or discover contradictions, and even make discoveries by doing proofs and explorations based on available data.

 Within a few years these capabilities will begin to filter out to the mainstream users of the Internet, and with a decade or two at most, they will become commonplace. There are only a few billion humans today, and each of us can only cope with a small amount of information and relationships before we become overloaded. But in an era of machine understanding of human knowledge we may potentially be able to leverage thousands to millions of software agents to help us. This will vastly increase our ability to cope with masses of information and relationships productively. In an increasingly complex, distributed, and rapidly changing world, we simply will not be able to cope in the future without help. The Semantic Web provides one path to solving these problems, enabling us to remain productive in the future.

Amplifying Human Collective Intelligence

The Semantic Web does not replace humans or take them out of the equation. It simply reduces the load on humans, freeing them from some of the pain of information overload, and providing a new path for software to begin to augment and even amplify human collective intelligence.

 Today there are several barriers to human collective intelligence that arise from basic limitations of the human brain. Human individuals, and groups of humans, simply cannot process or share knowledge effectively beyond a certain level of information or relationship complexity and change. For this reason, collaboration and collective intelligence are often easier to achieve and yield better results in small groups than large groups.

 As group size increases, productive collective intelligence becomes dramatically harder to achieve. Thus, ironically even though larger groups offer the potential for exponential increases in collective intelligence, in practice the opposite is usually the result: the larger teams get, the dumber they get. An entire industry of management consultants and facilitators exists because of these inefficiencies.

 The Semantic Web may be able to help with this age-old problem. By enabling software to understand information and relationships, we may be able to begin to automatically and intelligently facilitate interpersonal and group collaboration and knowledge management, and this may finally enable larger groups to become exponentially smarter instead of dumber.

Twine.com—A New Service for Collective Intelligence

My own company, Radar Networks, has recently introduced a new service based on the Semantic Web, called Twine (www.twine.com) that focuses on amplifying human collective intelligence. Twine helps individuals and groups manage and share knowledge more productively, using the Semantic Web.

 As people use Twine it learns from them and automatically organizes and connects their information with other related information, saving them valuable time and enabling them to discover connected knowledge. Twine provides individuals and groups with a smart virtual environment for their knowledge.

 Twine works with all kinds of knowledge—email, RSS, Web pages, documents, photos, videos, audio, contact records, or anything else. Regardless of where information actually resides, Twine enables users to view it as if it were in one place, and to see how it is connected and organized. Twine also automatically helps to make sense of information and to make it more easily searchable.

 Twine is a Web-based online service that is completely built using the Semantic Web. Although it is only in early beta-testing at the time of this writing, it is already demonstrating that intelligent machine-augmentation of individual and group knowledge management is possible and improves productivity and collaboration.

 As Twine unfolds and spreads to more individuals, groups and teams, and organizations and communities, it has the potential to become a new backbone for collective intelligence and knowledge sharing worldwide. At least that is the vision of the project. Time will tell whether we succeed it.

From Global Knowledge Commons to Global Brain

If the Semantic Web develops as predicted, it is possible that within 20 years much, if not all, human knowledge will be represented on the Web in machine-understandable form. We have seen the beginnings of this trend with services such as the Wikipedia. More recently, another initiative called the DBpedia is creating a Semantic Web version of the Wikipedia. But this is just the start of this trend.

 As more and more applications and services start producing Semantic Web metadata and exposing it back to other applications and services on the Web, we will begin to create a new global knowledge commons. At first these different services will function like islands of knowledge, but then they will begin to interconnect.

 A piece of knowledge in one place will link to and from pieces of knowledge in other places. Eventually this will become a giant associative network, not so unlike the brain, but on a global scale. And as people and applications surf through its connections and consume its knowledge, adding new knowledge and connections back to it as they do, it will change and self-organize dynamically. Just as the first generations of the Web have enabled a global medium for “hypertext,” the Semantic Web will enable a global medium for “hyperdata.”

 As one projects the future evolution of the Web and the emerging Semantic Web, one cannot help but notice certain similarities to the human mind. Some have even ventured to call this the beginning of an emerging “Global Brain.” It is too early to tell how similar it will truly be to the actual human brain. However we can already predict with confidence that it will a system that collectively will be capable of at least rudimentary learning, memory, perception, planning and reasoning.

 The human brain is a massively parallel collective intelligence engine in which billions of neurons interact across trillions of connections to process and generate knowledge.

 Similarly, the collective intelligence of the Web will involve the combined interactions and intelligence of billions of humans and machines across trillions of relationships. These processes will not be guided centrally, and the system will most likely not be centralized around a single construct of a “self” nor will it have anything like a human body.

 While it will be possible to say the system as a whole is intelligent, it will be difficult to locate any particular source of that intelligence; the intelligence will come from everywhere: from the humans, the software and even the data and links that comprise the Web.

 Because the Web is quite different from the human brain, it is likely that its intelligence will be different from what we think of as human intelligence today. But it will nonetheless be intelligent—in a massively distributed, emergent, and chaotic way that we humans may not be able to even comprehend. The “thoughts” the Web will think may be just too vast and complex for us to even recognize, let alone imagine or understand. Yet perhaps in decade-long time-scales at least, we will begin to be able to see the outlines of its thinking.



[1] Nova Spivack is the CEO and founder of Radar Networks, a San-Francisco company that is pioneering applications of the Semantic Web for distributed collaboration and knowledge management with a new service called Twine.com. Mr. Spivack is a recognized authority on the Semantic Web and future of the Web, which is sometimes called “Web 3.0.” A more detailed bio can be found at his company website: http://www.radarnetworks.com/about/management.html#nova.

March 26, 2008

My Visit to DERI -- World's Premier Semantic Web Research Institute

Earlier this month I had the opportunity to visit, and speak at, the Digital Enterprise Research Institute (DERI), located in Galway, Ireland. My hosts were Stefan Decker, the director of the lab, and John Breslin who is heading the SIOC project.

DERI has become the world's premier research institute for the Semantic Web. Everyone working in the field should know about them, and if you can, you should visit the lab to see what's happening there.

Part of the National University of Ireland, Galway. With over 100 researchers focused solely on the Semantic Web, and very significant financial backing, DERI has, to my knowledge, the highest concentration of Semantic Web expertise on the planet today. Needless to say, I was very impressed with what I saw there. Here is a brief synopsis of some of the projects that I was introduced to:

  • Semantic Web Search Engine (SWSE) and YARS, a massively scalable triplestore.  These projects are concerned with crawling and indexing the information on the Semantic Web so that end-users can find it. They have done good work on consolidating data and also on building a highly scalable triplestore architecture.
  • Sindice -- An API and search infrastructure for the Semantic Web. This project is focused on providing a rapid indexing API that apps can use to get their semantic content indexed, and that can also be used by apps to do semantic searches and retrieve semantic content from the rest of the Semantic Web. Sindice provides Web-scale semantic search capabilities to any semantic application or service.
  • SIOC -- Semantically Interlinked Online Communities. This is an ontology for linking and sharing data across online communities in an open manner, that is getting a lot of traction. SIOC is on its way to becoming a standard and may play a big role in enabling portability and interoperability of social Web data.
  • JeromeDL is developing technology for semantically enabled digital libraries. I was impressed with the powerful faceted navigation and search capabilities they demonstrated.
  • notitio.us. is a project for personal knowledge management of bookmarks and unstructured data.
  • SCOT, OpenTagging and Int.ere.st.  These projects are focused on making tags more interoperable, and for generating social networks and communities from tags. They provide a richer tag ontology and framework for representing, connecting and sharing tags across applications.
  • Semantic Web Services.  One of the big opportunities for the Semantic Web that is often overlooked by the media is Web services. Semantics can be used to describe Web services so they can find one another and connect, and even to compose and orchestrate transactions and other solutions across networks of Web services, using rules and reasoning capabilities. Think of this as dynamic semantic middleware, with reasoning built-in.
  • eLite. I was introduced to the eLite project, a large e-learning initiative that is applying the Semantic Web.
  • Nepomuk.  Nepomuk is a large effort supported by many big industry players. They are making a social semantic desktop and a set of developer tools and libraries for semantic applications that are being shipped in the Linux KDE distribution. This is a big step for the Semantic Web!
  • Semantic Reality. Last but not least, and perhaps one of the most eye-opening demos I saw at DERI, is the Semantic Reality project. They are using semantics to integrate sensors with the real world. They are creating an infrastructure that can scale to handle trillions of sensors eventually. Among other things I saw, you can ask things like "where are my keys?" and the system will search a network of sensors and show you a live image of your keys on the desk where you left them, and even give you a map showing the exact location. The service can also email you or phone you when things happen in the real world that you care about -- for example, if someone opens the door to your office, or a file cabinet, or your car, etc. Very groundbreaking research that could seed an entire new industry.

In summary, my visit to DERI was really eye-opening and impressive. I recommend that major organizations that want to really see the potential of the Semantic Web, and get involved on a research and development level, should consider a relationship with DERI -- they are clearly the leader in the space.

March 14, 2008

First Week of Twine Beta Phase II Report

This week we began letting the second wave of beta users into the Twine invite-only beta. It's been a very busy and exciting time for the Twine team. I'll be providing more detailed stats on an ongoing basis in a few weeks once we have more data to analyze. For now, I will just provide some qualitative observations.

Twine is still in the early beta process, but already we are seeing a rapid increase in adoption and scale. We have only let in a few hundred more users to get the process started, but we will be letting more and more in every week as we go forward.

It has been really exciting to watch Twine grow. I find that I am increasingly glued to my Interest Feed watching the fascinating information that is flowing through from all the new members. There have been many new twines created around a wide and growing range of interests and large amount of content added. The recommendations are also quite interesting -- I have already discovered a wide range of new people, twines and content that I didn't know about.

As of this writing, I now have 157 social connections in Twine. My social network in Twine has doubled in size in a week and is rapidly approaching the size of my Facebook network. That's pretty impressive considering this happened in a week (it took about half a year for my Facebook network to grow to that size).

We also had our first outside Twine client app, called "Entwine," written spontaneously by a beta user -- it browses through the RDF data from various items in Twine. That was very cool and unexpected! It really got the team jazzed to see this happen.

Twine is now full of active discussions around interests, questions, ideas, suggestions, current events, technologies and products. I have been pleasantly surprised to see so much interaction among users develop so quickly. As we had hypothesized, discussions are turning out to be a very key feature.

We have received a lot of great feedback from beta users within Twine, as well as many suggestions for how to improve Twine, streamline the user experience, and integrate Twine with other applications and services. This is exactly what we had hoped for from our beta. The team is hard at work analyzing this and prioritizing our next development sprints in light of what we are learning from our users (we do minor releases every week and major ones every 3 weeks).

Most of the press reviews and user stories point to Twine being very exciting, useful and full of potential, which has been great to hear after so much work --- they also universally agree that we still have room to improve the user experience and we need to work on making Twine easier to learn and use. That's not unexpected -- we opened the beta well before the app is finished in order to understand user priorities better. We are really focusing on usability and bug fixes for the next several sprints.  All this feedback has been incredibly valuable to the team. Keep it coming!

Another interesting observation. The quality of the users in Twine is distinctly impressive. It's a very smart community of leading-edge thinkers, builders, and technology adopters. Kind of like having your own TED Conference, 24/7 around the world. We will be inviting in a wider range of users in later phases, once the app is further along. In the meantime it is really great to see so many of my colleagues in Twine, and to be making so many new contacts and friends here. For this initial phase this is exactly the audience we need -- people who will really roll up their sleeves and help us make Twine into a great application.

Twine is also rapidly aggregating most of the leading minds in the worldwide Semantic Web development and research community into a social and collaborative interest network. It is great to have this global community of people interested in building and using the Semantic Web come together in Twine, an application that is built using Semantic Web technologies on the Radar Networks Semantic Web Applications platform. I look forward to beginning to share Twine with this worldwide community, and to collaborate with others to extend it and integrate it with other semantic apps and data sets. This is definitely our goal.

It's been a great week. I haven't slept much. I'm having too much fun in Twine!

March 08, 2008

Do You Want Early Access to the Twine Beta?

Special offer to readers of my blog...

There are now well over 30,000 users in the queue to get into the Twine beta. We're going to start letting people in from the waiting list in waves and it should take about a month or two to let everyone in.

But what good is a waiting list if there's no way to cut to the front, right? Fortunately, there is a way to skip ahead to the front of the line...

Write a blog post about Twine on your blog and why you want early access, and send me the link to nova (at) radarnetworks (dot) com. along with your first name, last name, and email address. If I like your post, I'll get you an early access VIP pass to front of the line.

See you in Twine!

March 03, 2008

How about Web 3G?

I'm here at the BlogTalk conference in Cork, Ireland with a range of bloggers and technologists discussing the emerging social Web. Including myself, Ian Davis and Paul Miller from Talis, there are also a bunch of other Semantic Web folks including Dan Brickley, and a group from DERI Galway.

Over dinner a few of us were discussing the terms "Semantic Web" versus "Web 3.0" and we all felt a better term was needed. After some thinking, Ian Davis suggested "Web 3G." I like this term better than Web 3.0 because it loses the "version number" aspect that so many objected to. It has a familiar ring to it as well, reminding me of the 3G wireless phone initiative. It also suggests Tim Berners-Lee's "Giant Global Graph" or GGG -- a synonym for the Semantic Web. Ian stayed up late and put together a nice blog post about the term, echoing many of my own sentiments about how this term should apply to a decade (the third decade of the Web), rather than to a particular technology.

February 25, 2008

My Commentary: Radar Networks Raises $13M for Twine

I am pleased to announce that my company Radar Networks, has raised a $13M Series B investment round to grow our product, Twine. The investment comes from Velocity Interactive Group, DFJ, and Vulcan. Ross Levinsohn -- the man who acquired and ran MySpace for Fox Interactive -- will be joining our board. I'm very excited to be working with Ross and to have his help guiding Twine as it grows.

We are planning to use these funds to begin rolling Twine out to broader consumer markets as part of our multi-year plan to build Twine into the leading service for organizing, sharing and discovering information around interests. One of the key themes of Web 3.0 is to be help people make sense of the overwhelming amount of information and change in the online world, and at Twine, we think interests are going to play a key organizing role in that process.

Your interests comprise the portion of your information and relationships that are actually important enough that you want to keep track of them and share them with others. The question that Twine addresses is how to help individuals and groups more efficiently locate, manage and communicate around their interests in the onslaught of online information they have to cope with. The solution to information overload is not to organize all the information in the world (an impossible task), it is to help individuals and groups organize THEIR information (a much more feasible goal).

In March we are going to expand the Twine beta to begin letting more people in. Currently we have around 30,000 people on the wait-list and more coming in steadily. In March we will start letting all of these people in, gradually in waves of a few thousand at a time, and letting them invite their friends in. So to get into Twine you need to sign up on the list on the Twine site, or have a friend who is already in the service invite you in. I look forward to seeing you in Twine!

The last few months of closed beta have been very helpful in getting a lot of useful feedback and testing that has helped us improve the product in many ways. This next wave will be an exciting phase for Twine as we begin to really grow the service with more users. I am sure there will be a lot of great feedback and improvements that result from this.

However, even though we will be letting more people in soon, we are still very much in beta and will be for quite some time to come -- There will still be things that aren't finished, aren't perfect, or aren't there yet -- so your patience will be appreciated as we continue to work on Twine over the coming year. We are letting people in to help us guide the service in the right direction, and to learn from our users. Today Twine is about 10% of what we have planned for it. First we have to get the basics right -- then, in the coming year, we will really start to surface more of the power of the underlying semantic platform. We're psyched to get all this built -- what we have planned is truly exciting!

February 12, 2008

Video of My Semantic Web Talk

This is a video of me giving commentary on my "Understanding the Semantic Web" talk and how it relates to Twine, to a group of French business school students who made a visit to our office last month.


Here is the link to the video, if the embedded version below does not play.

Nova Spivack - Semantic Web Talk from Nicolas Cynober on Vimeo.

January 19, 2008

Fun With CoolWhip: The Twine Crunchies Video

The Crunchies are done. At Radar Networks we are really honored to have our product, Twine.com, nominated as a finalist for Best Technology Innovation of 2007. It was very cool to see our Twine logo up there on stage next to Facebook, Digg, LinkedIn and so many other incredible companies -- especially considering we were the only company that was still in closed Beta in the awards (and yes, we are coming out of closed beta in March, so get ready!).

Meanwhile, one of things that made the Crunchies fun was that every company was asked to submit a video. Not all companies did, and not all of them were that creative. Some however were really funny, including ours. Here is a link to the "director's cut" of the Twine Crunchies video for 2007. Enjoy!!!

ps. For those who don't live in the USA... CoolWhip is a synthetic dessert topping we have here in the States. Imagine whipped cream, made out of some kind of industrial byproduct. It actually tastes pretty good, whatever it is. And it has almost no calories -- possibly because there is nothing in that is actually digestible by humans. It's really a wonderful technological innovation. Thus our choice.

January 14, 2008

A Nice Video Intro to The Semantic Web for Non-Geeks

Question: What do you do if you're not a computer scientist but you are interested in understanding what all this Semantic Web stuff is about?

Answer: Watch this video!

My Photo

Get my RSS Feed

Radar Networks

  • twine.jpg
  • logo_v5_03b.jpg
  • logo_v5_03b.jpg

Nova's Trip to Edge of Space

  • Stepsedgestratosphere
    In 1999 I flew to the edge of space with the Russian air force, with Space Adventures. I made it to an altitude of just under 100,000 feet and flew at Mach 3 in a Mig-25 piloted by one of Russia's best test-pilots. These pics were taken by Space Adventures from similar flights to mine. I didn't take digital stills -- I got the whole flight on digital video, which was featured on the Discovery Channel.

Nova & Friends, Training For Space...

  • Img047
    In 1999 I was invited to Russia as a guest of the Russian Space Agency to participate in zero-gravity training on an Ilyushin-76 parabolic flight training aircraft. It was really fun!!!! Among other people on that adventure were Peter Diamandis (founder of the X-Prize and Zero-G Corporation), Bijal Trivedi (a good friend of mine, science journalist), and "Lord British" (creator of the Ultima games). Here are some pictures from that trip...

Featured Past Articles

Recent Comments

Pages