May 13, 2008

Associative Search and the Semantic Web: The Next Step Beyond Natural Language Search

Our present day search engines are a poor match for the way that our brains actually think and search for answers. Our brains search associatively along networks of relationships. We search for things that are related to things we know, and things that are related to those things. Our brains not only search along these networks, they sense when networks intersect, and that is how we find things.

In fact our memory often works by "homing in" on what we are looking for, rather than finding exact matches. Keyword searching provides a very weak form of "homing in" -- by choosing our keywords carefully we can limit the set of things which match. But the problem is we can only find things which contain those literal keywords. But our brains on the other hand use a much more sophisticated form of "homing in" on answers. Instead of literal matches, our brains look for things things which are associatively connected to things we remember, in order to find what we are ultimately looking for.

For example, consider the case where you cannot remember someone's name. How do you remember it? Usually we start by trying to remember various facts about that person. By doing this our brains then start networking from those facts to other facts and finally to other memories that they intersect.  Ultimately through this process of "free association" or "associative memory" we home in on things which eventually trigger a memory of the person's name.

Keyword search is a very weak approximation of associative search because there really is no concept of a relationship at all. By entering keywords into a search engine like Google we are simulating an associative search, but without the real power of actual relationships between things to help us. Google does not know how various concepts are related and it doesn't take that into account when helping us find things. Instead, Google just looks for documents that contain exact matches to the terms we are looking for and weights them statistically. It makes some use of relationships between Web pages to rank the results, but it does not actually search along relationships to find new results.

Google does not work the way our brains think. This difference creates an inefficiency for searchers: We have to do the work of translating our associative way of thinking into "keywordese" that is likely to return results we want. Often this requires a bit of trial and error and reiteration of our searches before we get result sets that match our needs.

Natural language search engines are slightly closer because they at least attempt to understand the meaning of a query and the meaning of result documents in order to make a better match between the question and potential answers. But this is still not true associative search.

A natural language search can understand the meaning of a query like "books about Harry Potter" and it knows this is not the same as "Books by Harry Potter." But ultimately what is happening is that a linguistic expression is being converted into a more sophisticated keyword search. The language in the query is being mapped to documents that contain text that answers a question, or to data objects that match the thing being asked for. This is certainly better than keyword search but it is still a form of literal matching. It is not really making use of associative search along relationships in the data (other than linguistic relationships between words in the query) or any sort of sophisticated reasoning.

Associative search doesn't merely understand the meaning of the query, it understands and can reason about relationships in the data. This is an important distinction. An associative search returns documents that represent things that related via various forms of associations (semantic links) to the things in the query. An associative search looks through a network of associations for the things that are most connected to the items in the query. By specifying more specific starting points, the set of things which are connected to all those starting points is narrowed.

Associative search is a very different approach to search from keyword search (which merely looks for things with the keywords in them) and natural language search (which merely looks for things that contain content that matches the meaning of the question). It also happens to be more similar to how our brains actually think.

Associative search is the basis for reasoning, but in its simplest form it does not require reasoning. In its simplest form it is just applying statistics to networks of relationships to narrow down on things which are highly related to items in a query. By adding reasoning to the mix it becomes vastly more powerful however. Reasoning adds the ability to generalize or get more specific, and to weight various paths through the network of relationships in more sophisticated ways.

Our current search tools -- whether they are keyword based or natural language based do not support true associative search. But we do see associative search starting to appear in a very different breed of application: social networks. A search in LinkedIn for example, is an associative search.

As we begin to merge our social networks with our search engines we may start to see associative search engines appearing on the Web. In fact, I would venture that this is how Facebook could give Google some serious competition. But they have to hurry if they are going to do this -- Google has clearly realized the power of "social search" and is rapidly moving to leverage it in their own search results.

Ultimately associative search is more than just social search however. To be really effective, associative search engines need to understand and leverage the full spectrum of relationships between things, not just social relationships. In order to accomplish this, associative search engines need the Semantic Web. They need to see and understand more types of relationships between more types of things.

With that in mind, here is an example of how Semantic Web enabled associative search will work in the future.

PROBLEM: I am trying to remember name of the organizer of a conference I once attended.

WHAT I ALREADY  KNOW:

  • I know this person and have corresponded with them in the past.
  • The conference was related to government and the Internet.
  • It took place in a town near Big Sur, but I can't remember the name of the town.
  • The organizer of the conference once introduced me to a male celebrity, but I can't remember the celebrity's name.
  • I gave a talk at the Conference about Web 3.0.
  • My friend, Sue Smith, also spoke at the conference.
  • The conference I attended took place in the Spring, but I am not sure if it was last year or two years ago.

In the above example, I cannot remember the specific keywords that will help me generate a query to find the answer. Instead, I remember a number of relationships and generalizations about the answer. Present day search engines cannot see these relationships, and they have no ability to understand a generalization and look at things it contains.

The ability to intersect the sets formed by relationships and generalizations is a fundamental feature of human memory and search. But our present day tools don't have these capabilities. Thus we have to spend time translating our questions into keywordese, rather than just asking our questions in the actual language of human thought.

There are two ways to approach solving this.

  1. The first way is to create artificial intelligence which, given a question in natural language English, can understand it and reason about the question as well as understand and reason about the information in the set of documents being searched, in order to intelligently arrive at candidate answers. This is computationally intensive, and very hard to program. This is why AI hasn't quite happened yet on this scale.
  2. A perhaps easier approach is to use the Semantic Web. In the Semantic Web approach, metadata is embedded into content that describes the meaning of the content, it's various important properties, and its relationships to other concepts. On the basis of this metadata, the problem becomes much simpler to solve. Instead of doing high-level AI it becomes essentially a statistical search.

Now let's look at how using the Semantic Web could help us solve the above problem via an associative search:

Items are connected to more general or specific concepts by virtue of semantic linkages between concepts. For example, the conference I am looking for is related to the concepts "Government" and "Technology." If I can at least remember that then I can find conferences related to government and technology. Furthermore, since the concept "Policy" is a subset of government it may be related to that topic as well.

Likewise, things are connected to things that are "near" them via geographic links. Because the conference was near Big Sur it is in Northern California, along the coast. It is probably in a town that is geographically close, ror example Carmel-by-the-Sea is a town that is near the Big Sur area.

The organizer of the conference introduced me to a male celebrity. There are several celebrities in my social network. If the fact that I met certain people via introductions from other people was stored using semantic links, then this too would be searchable. For example, "find all celebrities I was introduced to by my connections" would be a solvable query. Similarly, "find people who introduced me to celebrities" would also be solvable.

The fact that I gave a talk at the conference could also be semantically represented on a data record describing the conference, as well as on my own profile. Thus there could exist a link such as "speaker at" which links me to various conferences I have spoken at. I could then get a list of all the conference I have spoken at. I could also look for all the conferences where both myself and Sue Smith were speakers.

Or, better yet, there could be a link called "Gave talk about" which links me to an instance describing each talk I have given. From such an instance there could then be "Gave talk at" links to all the events where I have given that talk. So I could look up my "Web 3.0" talk and then see all the conferences where I gave that talk.

Temporal relations can also be generalized and semantically represented. For example, the conference I am looking for took place in the spring. Therefore only look for conferences that took place in or near months that are considered to be in the spring season.

By intersecting the results of the above searches we narrow down very precisely to a set of people I might be looking for, or just to a single qualifying person.

For example the answer I was seeking for was that the organizer was named Robert Jones, and the conference was about Government and Technology Policy in Carmel-by-the-Sea last spring. This result should be easily findable via associative search starting from the above set of things I remember.

But if for some reason the answer is still not there, there is another capability which the brain uses that we need to add to our search engines: Perturbation, or what could be called "prospecting."

The query I entered is comprised of a question and a set of facts related to the answer I am seeking. But there is a possiblity that I asked the question incorrectly, or some of the facts I added were incorrect, or insufficient. Perturbation can correct for this by introducing variations into the question and the facts in order to explore the space of answers that are "near" them as well.

There are many ways to go about adding perturbation to the system -- for example, we can search more than one hop out from every link, or we can search for other types of relationships that are highly correlated with relationships we are asking for explicitly, or we can include results for things which are strongly connected to things that are found.

From a user-interface standpoint perturbation can be controlled with a simple "sliding lever" in the user interface for "Precision." If the user sets very high Precision as a requirement then there is no perturbation -- the results are exact matches to the query and facts. If there is low Precision as a requirement then there can be more perturbation, thus the results are fuzzy and may include things that are near what I asked for but not exactly what I specified, enabling me to discover things via relevant relationships that I could not even remember to mention as facts.

Finally, using a reasoner, the results found by the above search can be analyzed such that those results which are most likely to be what I am looking for, given the facts I have included as constraints, are presented first. Reasoning becomes the ranking algorithm in the system, rather than something like Pagerank. The answers that actually make the most sense in the context of my question are delivered first.

The above illustration describes how searches that are powered by the Semantic Web will work, once this technology is widely adopted. This is how the brain works, and how our search engines should work as well. 

This is not a pipedream -- in fact it is already happening in research settings and in the government. Within 15 years, if not a lot sooner, we will see these capabilities emerge in consumer-grade search interfaces.

Continue reading "Associative Search and the Semantic Web: The Next Step Beyond Natural Language Search" »

April 25, 2008

Video of my Talk at Digital Now

This is a video of my talk at the Digital Now conference in Orlando yesterday. There's a long intro by Don Dea, and then I speak (starting at index 05:14) about the Semantic Web and Twine.

April 23, 2008

Great Collective Intelligence Book; Includes a Chapter I Wrote

I highly recommend this new book on Collective Intelligence. It features chapters by a Who's Who of thinkers on Collective Intelligence, including a chapter by me about "Harnessing the Collective Intelligence of the World Wide Web."

Here is the full-text of my chapter, minus illustrations (the rest of the book is great and I suggest you buy it to have on your shelf. It's a big volume and worth the read):

 

Harnessing the collective intelligence

of the World-Wide Web

 

Nova Spivack[1]

 

Introduction

We are about to enter the third decade of the Web, sometimes referred to as “Web 3.0.” During this decade, the Web will evolve from a globally distributed fileserver into a globally distributed database. This shift will be enabled by a set of emerging technologies called The Semantic Web, which add a new layer of machine-understandable metadata about the meaning of information to the content of the Web.

The Semantic Web will catalyze a new era in collective intelligence. Individuals, groups, organizations and communities will be able to create, connect, find and share knowledge more intelligently and productively than ever before. Ultimately it will enable the Web itself, and all the people and applications that participate in it, to become more collectively intelligent.

Web 3.0—The Third Decade of the Web


The third-decade of the Web, “Web 3.0,” begins officially in 2010, but we are already entering the early stages of this transition today. To understand where the Web is headed it helps to zoom out to a larger historical context.

 The final decade of the PC-era (1980—1990) was largely concerned with innovation on the front-end of the personal computer: the desktop and user interface layer of the PC. The focus of this period was in making PC’s easier to use with innovations such as Microsoft Windows, the Macintosh user-interface, and more consistent user-interfaces and integration across applications.

 The first decade of the Web-era (“Web 1.0” from 1990 - 2000), was focused on the back-end of the Web: the core technologies and platforms of the Web such as HTML, HTTP, Web servers, search engines, commerce technologies, advertising technologies, and the basic architectures and business model of Web applications. This decade was mainly focused on the technology and infrastructure of the Web and most of the actual innovation dollars were spent on making things that only software developers could see.

 In contrast, the second decade of the Web (“Web 2.0” from 2000—2010) has been largely focused on the front-end of the Web. Much of the innovation has not been on actual technology but rather on design patterns and user-interfaces for improving the end-user experience of the Web. During this decade we have focused on paradigms such as

AJAX

, which is a set of technologies and design methodologies for making Web sites more visually appealing and interactive.

 Another big focus of Web 2.0 has been user-generated content, and in particular the practice of “tagging” content with subject tags. Tagging has in turn led to the concept of “folksonomies” in which taxonomies that organize data are evolved in a bottom-up fashion by a decentralized community of users.

 The coming third-decade of the Web (“Web 3.0” from 2010—2020) will shift the emphasis back to the back-end of the Web. This decade will be largely focused on upgrading the technical infrastructure and content of the Web, based on emerging technologies such as the Semantic Web. During this decade the primary push will be enriching the Web so that it can function more like a database.

 Today the Web is composed mainly of unstructured and semistructured data such as text files and Web pages. Keyword search engines are able to provide rudimentary search capabilities over this information, but only for the most simplistic queries. Compare current Web search to the more precise capabilities of queries against a database and the difference is immediately clear. The Web does not provide anything close to the search capabilities or precision of a database today. But that is about to change.

 The Semantic Web provides a way to enrich both unstructured and structured data so that it can be queried with the precision of a database. Essentially, it provides a way to tag any information with metadata that explains what it means—and this metadata can be understood by software applications, such as search engines or knowledge management applications. It’s important to note that The Semantic Web is not a new Web, it’s just a new layer of the Web we already have. The semantic metadata that comprises the knowledge of the Semantic Web won’t live in some new place—it lives right in the existing documents and data on the Web. The knowledge of the Semantic Web is encoded using special new markup languages such as RDF and OWL.

 This metadata is invisible to users (it doesn’t appear in Web browsers) but behind the scenes it can be read by any application that is compatible with these markup languages. So when any application, such as a next-generation search engine, sees a Web page or data record that contains RDF or OWL metadata, it can then use that metadata to understand what that page or data record means, is about, what it is related to, and how to interpret it. With Semantic Web metadata in place, searches on the Web will be as, or even more, precise as those in any database. But that is just the beginning of what the Semantic Web enables. Beyond merely improving search, the Semantic Web actually transforms the Web into a database—a worldwide database in which data records can be moved around, shared, and linked together in new ways.

 On the basis of the technologies of The Semantic Web and the Web 3.0 era, we will then be able to enter the fourth decade of the Web (“Web 4.0”—2020—2030) in which the shift will turn back to the front-end of the Web. The Semantic Web doesn’t just add metadata about the meaning of information to the Web, it also enables metadata to be added about relationships, conceptual linkages, logical connections, and even logical rules. On the basis of this additional metadata, Web users and other applications will be able to harness the power of intelligent agents that will search the Web for things that interest them, make suggestions and recommendations, and even potentially transact on their behalf. This will open the door to a new kind of user-interface to the Web that is smarter and more conversational in nature, in which users will enter into dialogues with agents and interact with them search the Web and make decisions. A conversational interface to the Web will be more appropriate in the increasingly mobile world, when users will mostly interact with the Web from small portable mobile or embedded devices.

 Users on mobile devices that have little to no screen real-estate will need a more productive way to interact with the Web than through a miniature browser; nobody likes sorting through pages of Google results on a cell phone. Instead, they will want to simply ask a question (perhaps through a voice interface, rather than typing with their thumbs) and have a virtual intelligent assistant dispatch agents to find the best answers and then report back to them with results or to ask further questions or for a decision.

 Smart, interactive conversational interfaces and intelligent agent-based virtual assistants are possible today, but only in narrow domains. In the Web 4.0 era they may in fact be our primary way of interacting with the whole Web and may be built into the user interface of most search engines, personal email providers, and leading Websites.

The Virtualization of Knowledge and Intelligence

In the long-term, the Semantic Web provides a way to move much of the “intelligence” that currently resides in the minds of individuals, groups and organizations, and/or that is hard-coded into various software and Web applications, out onto the Web itself. It provides a way to virtualize knowledge and intelligence in an explicitly machine-readable, universally accessible form. In other words, it provides a way to start making the Web “smarter.”

 Knowledge and expertise that previously only existed in people’s heads, or had to be painstakingly coded into each particular vertical software application, will be represented in a form of universally readable metadata on the Web—just like HTML documents today. In other words, using the Semantic Web you can publish knowledge and even the underlying conceptual frameworks, rules and heuristics that embody domain expertise, on the Web in an abstract, machine-readable form.

 There are many benefits that stem from this. For one thing, it will make it much easier to write smart software applications because much of the necessary “smarts” will not reside in the applications at all, but will rather live out there on the Web.

 For example, to write an application that can intelligently assist with travel logistics, a developer will simply be able to point it at existing sets of knowledge and rules that exist for the travel domain on the Web already. The application will be able to draw on those pools of existing domain-knowledge without having to be specifically programmed to do so, because it understands the underlying standards of the Semantic Web. Similarly, the same application could just as easily help someone trade on the stock market, by simply pointing to domain knowledge on Semantic Web about finance and investment.

 As more pools of domain knowledge are added to the Web around various verticals, all applications will potentially benefit. This sets up a kind of network effect in which a global knowledge commons begins to form and self-amplify over time. For example, first the travel domain is added to the Semantic Web. Then someone else adds domain knowledge about geography and links them together. Another group then adds domain knowledge about hotels, and another one adds domain knowledge about weather—and these all connect to each other in various ways.

 With all of this interconnected knowledge on the Web in machine-readable form, application developers can then more easily and quickly write applications that understand concepts and rules related to booking travel reservations, and that can cross-reference reservation information with knowledge about geographic places, relevant weather, and hotels in those locations. And in the other direction, someone booking a hotel can then find information about relevant weather and book travel to get to that hotel. This is just one example. There are an infinite range of other possibilities for these technologies across all domains.

 The key point of all this is that The Semantic Web enables applications to become thinner, yet at the same time smarter, by drawing on the collective intelligence embodied by the Web itself. It will become possible to write applications that understand one or more specialized vertical domains faster, and ultimately applications will become more general—they will be able to dynamically load in specialized domain knowledge for whatever domain is needed, without having to be specifically programmed or limited to just those domains.

 Application developers will be able to draw on the knowledge added to the Web by others, instead of having to reinvent the wheel by programming all that knowledge directly into their applications every time. And in turn, the knowledge that their applications create can, if they want to allow it, be published back onto the Web for other applications to draw on as well.

Semantic Web as The Next Leap in Human Collective Intelligence

Looking at the evolution of the Semantic Web in historical context, we can view it as the next big step in a longer process of the evolution of human collective intelligence.

 Before the invention of written language, knowledge could only be communicated verbally and was handed down through oral traditions. During this period, one had to be in immediate physical proximity of someone who had certain knowledge in order to receive it from them. This meant that the maximum effective range of human collective intelligence was quite short in space and time.

 With the invention of writing, and eventually printing, humanity was able to process knowledge over longer distances in space and time, and with less reliance on particular individuals. People could now engage in dialogues and dialectics with larger groups of people in more places, across larger distances in space, and with more precision over larger ranges of time.

 The printing press took this to a new level by starting the process of mass-distribution of knowledge, but it still relied on an expensive physical manufacturing process and a paper medium that was perishable and costly to store and move around.

 With advent of electronic communications of various forms, humanity achieved many milestones—the transmission of knowledge could take place at the speed of light, and using digital storage media we were freed from the limitations of the paper medium.

 The Internet and the Web transformed the process of distributing knowledge even further—enabling a global knowledge commons to emerge. The Internet and Web enable anyone and everyone to become providers of knowledge, not just consumers—a fundamental shift in the way that knowledge transmission and media function. They are not just about the mass-distribution and mass-consumption of knowledge; they enable the mass-creation of knowledge. In some respects these technologies are analogues of the printing press in that they have democratized the process of creating, sharing and accessing knowledge by fundamentally changing the economics of the entire process—making it affordable and accessible to all.

 But even on the Web, for all its many benefits, knowledge is still not free from the limitations of the human brain. Only humans can really understand the knowledge that is represented in Web sites and databases, for example. While all other processes related to the distribution, storage and access to knowledge can now be done digitally, using software and the Web, the processes of creating, consuming and actually understanding knowledge are still limited only to living humans. That’s where the Semantic Web comes in.

Liberating Knowledge and Intelligence from Human Brains

The Semantic Web virtualizes human knowledge and expertise outside of human brains, and even outside of any particular software application—knowledge becomes essentially just more data on the Web. When we speak of knowledge here we don’t just mean information—the first-order raw data that is currently on the Web—we mean the actual meaning and interpretation of the information that is not on the Web but rather exists only in human brains.

 The Semantic Web provides a way to make the meaning and interpretation of information explicit in a form that is unambiguous and publishable, and shareable, on the Web. This will make all this knowledge understandable by software. It’s almost like the invention of a new language—a sort of meta-language for formally expressing what exactly you mean when you say something. The impact of this could be enormous.

 For the first time in human history, we won’t have to rely only on humans to create, understand and consume knowledge—our machines will be able to help us do this. They will help us work, collaborate, create, explore, monitor, discover, search, innovate, connect, and synthesize. This will open the door to an almost unimaginable amplification of the human mind, and human collective intelligence on this planet. At first the impact of this will largely be focused around assisting humans with simple clerical and research tasks, but the process will inevitably continue to evolve to a point where software will begin to originate new knowledge for us, advise us, and eventually to even start making certain types of decisions on our behalf.

 Although the Semantic Web has barely moved from the lab to the mainstream Internet, it is in fact much farther along than most people realize. Today there are already semantic applications under development that can organize all your information automatically, make recommendations based on your dynamically changing interests, identify new connections between ideas or documents in different places, make logical inferences or discover contradictions, and even make discoveries by doing proofs and explorations based on available data.

 Within a few years these capabilities will begin to filter out to the mainstream users of the Internet, and with a decade or two at most, they will become commonplace. There are only a few billion humans today, and each of us can only cope with a small amount of information and relationships before we become overloaded. But in an era of machine understanding of human knowledge we may potentially be able to leverage thousands to millions of software agents to help us. This will vastly increase our ability to cope with masses of information and relationships productively. In an increasingly complex, distributed, and rapidly changing world, we simply will not be able to cope in the future without help. The Semantic Web provides one path to solving these problems, enabling us to remain productive in the future.

Amplifying Human Collective Intelligence

The Semantic Web does not replace humans or take them out of the equation. It simply reduces the load on humans, freeing them from some of the pain of information overload, and providing a new path for software to begin to augment and even amplify human collective intelligence.

 Today there are several barriers to human collective intelligence that arise from basic limitations of the human brain. Human individuals, and groups of humans, simply cannot process or share knowledge effectively beyond a certain level of information or relationship complexity and change. For this reason, collaboration and collective intelligence are often easier to achieve and yield better results in small groups than large groups.

 As group size increases, productive collective intelligence becomes dramatically harder to achieve. Thus, ironically even though larger groups offer the potential for exponential increases in collective intelligence, in practice the opposite is usually the result: the larger teams get, the dumber they get. An entire industry of management consultants and facilitators exists because of these inefficiencies.

 The Semantic Web may be able to help with this age-old problem. By enabling software to understand information and relationships, we may be able to begin to automatically and intelligently facilitate interpersonal and group collaboration and knowledge management, and this may finally enable larger groups to become exponentially smarter instead of dumber.

Twine.com—A New Service for Collective Intelligence

My own company, Radar Networks, has recently introduced a new service based on the Semantic Web, called Twine (www.twine.com) that focuses on amplifying human collective intelligence. Twine helps individuals and groups manage and share knowledge more productively, using the Semantic Web.

 As people use Twine it learns from them and automatically organizes and connects their information with other related information, saving them valuable time and enabling them to discover connected knowledge. Twine provides individuals and groups with a smart virtual environment for their knowledge.

 Twine works with all kinds of knowledge—email, RSS, Web pages, documents, photos, videos, audio, contact records, or anything else. Regardless of where information actually resides, Twine enables users to view it as if it were in one place, and to see how it is connected and organized. Twine also automatically helps to make sense of information and to make it more easily searchable.

 Twine is a Web-based online service that is completely built using the Semantic Web. Although it is only in early beta-testing at the time of this writing, it is already demonstrating that intelligent machine-augmentation of individual and group knowledge management is possible and improves productivity and collaboration.

 As Twine unfolds and spreads to more individuals, groups and teams, and organizations and communities, it has the potential to become a new backbone for collective intelligence and knowledge sharing worldwide. At least that is the vision of the project. Time will tell whether we succeed it.

From Global Knowledge Commons to Global Brain

If the Semantic Web develops as predicted, it is possible that within 20 years much, if not all, human knowledge will be represented on the Web in machine-understandable form. We have seen the beginnings of this trend with services such as the Wikipedia. More recently, another initiative called the DBpedia is creating a Semantic Web version of the Wikipedia. But this is just the start of this trend.

 As more and more applications and services start producing Semantic Web metadata and exposing it back to other applications and services on the Web, we will begin to create a new global knowledge commons. At first these different services will function like islands of knowledge, but then they will begin to interconnect.

 A piece of knowledge in one place will link to and from pieces of knowledge in other places. Eventually this will become a giant associative network, not so unlike the brain, but on a global scale. And as people and applications surf through its connections and consume its knowledge, adding new knowledge and connections back to it as they do, it will change and self-organize dynamically. Just as the first generations of the Web have enabled a global medium for “hypertext,” the Semantic Web will enable a global medium for “hyperdata.”

 As one projects the future evolution of the Web and the emerging Semantic Web, one cannot help but notice certain similarities to the human mind. Some have even ventured to call this the beginning of an emerging “Global Brain.” It is too early to tell how similar it will truly be to the actual human brain. However we can already predict with confidence that it will a system that collectively will be capable of at least rudimentary learning, memory, perception, planning and reasoning.

 The human brain is a massively parallel collective intelligence engine in which billions of neurons interact across trillions of connections to process and generate knowledge.

 Similarly, the collective intelligence of the Web will involve the combined interactions and intelligence of billions of humans and machines across trillions of relationships. These processes will not be guided centrally, and the system will most likely not be centralized around a single construct of a “self” nor will it have anything like a human body.

 While it will be possible to say the system as a whole is intelligent, it will be difficult to locate any particular source of that intelligence; the intelligence will come from everywhere: from the humans, the software and even the data and links that comprise the Web.

 Because the Web is quite different from the human brain, it is likely that its intelligence will be different from what we think of as human intelligence today. But it will nonetheless be intelligent—in a massively distributed, emergent, and chaotic way that we humans may not be able to even comprehend. The “thoughts” the Web will think may be just too vast and complex for us to even recognize, let alone imagine or understand. Yet perhaps in decade-long time-scales at least, we will begin to be able to see the outlines of its thinking.



[1] Nova Spivack is the CEO and founder of Radar Networks, a San-Francisco company that is pioneering applications of the Semantic Web for distributed collaboration and knowledge management with a new service called Twine.com. Mr. Spivack is a recognized authority on the Semantic Web and future of the Web, which is sometimes called “Web 3.0.” A more detailed bio can be found at his company website: http://www.radarnetworks.com/about/management.html#nova.

April 19, 2008

The Wikipedia, Knowledge Preservation and DNA

I had an interesting thought today about the long-term preservation and transmission of human knowledge.

The Wikipedia may be on its way to becoming the one of the best places in which to preserve knowledge for future generations. But this is just the beginning. What if we could encode the Wikipedia into the Junk DNA portion of our own genome? It appears that something like this may actually be possible -- at least according some recent studies of the non-coding regions of the human genome.

If we could actually encode knowledge, like the Wikipedia for example, into our genome, the next logical step would be to find a way to access it directly.

At first we might only be able to access and read the knowledge stored in our DNA through a computationally intensive genetic analysis of an individual's DNA. In order to correct any errors in the data from mutuation, we would also need to cross-reference this individual data with similar analyses from the DNA of other people who also carry this data in their DNA. But this is just the beginning. There are however ways to stored data such that there is enough redundancy to protect against degradation. Assuming we could do this we might be able to eliminate the need for cross referencing as a form of error correction -- the data itself would be self-correcting so to speak. If we could accomplish this then the next step would be to find a way for an individual to access the knowledge stored in their DNA in real-time, directly. That's a long way off but there may be a way to do this using some future nano-scale genomic-brain interface. This opens up some fascinating areas of speculation to say the least.

 

Why The Wikipedia?

The Wikipedia has certain qualities that make it better than other forms of knowledge preservation and transmission:

  • The Wikipedia exists primarily in electronic form. It is not subject to age or decay like a physical encyclopedia or document. This means it can persist forever, and will not be lost to time, if it continues to be maintained electronically in the future.
  • The Wikipedia is replicated in multiple locations around the world. The fact that it is so easy to replicate, and is so widely replicated means that it is less at risk of being lost due to a local disaster at any given storage location. It also means it is more likely to continue, somewhere, as a living document that goes on to reflect majority consensus reality into the distant future. It is highly improbable that it will ever suffer the same fate as certain ancient documents which only existed in one place and were subsequently lost in floods, fires, or wars, etc. At this point only a planet-wide extinction level event could erase the Wikipedia and/or prevent future generations from finding it.
  • The Wikipedia is highly viral, it's content is increasingly cited and it is far ahead of any competing system in terms of coverage and brand-recognition. Because so many other pieces of content on the Web and in other media refer to the Wikipedia as the world's global authority for knowledge, it is considered increasingly authoritative and is increasingly visible and increasingly cited. The Law of Increasing Returns indicates that this will continue to self-amplify, making the Wikipedia the best candidate for an authoritative global repository of knowledge.

What this means is that if you have any knowledge that you want to preserve for future generations, a good place to put it is in the Wikipedia. Putting it there almost guarantees that it will propagate around the world and throughout the human-explored universe (in the future, if we become a spacefaring civilization), and into the distant future of human civilizations.

The Potential For Storing Knowledge in DNA

Is it possible to store knowledge -- such as the Wikipedia -- in human DNA? It would certainly be useful if we could do this. By storing knowledge in human DNA of living humans, or of common bacteria for that matter, it could then potentially be passed down and spread through generations into the far future. However the mutability of DNA over time might gradually introduce errors that would degrade the information within particular lines of DNA over long periods of time.

Perhaps this could however be mitigated by comparing DNA samples from a large cross-section of individuals within the population of descendants of original holders of DNA-knowledge-archives in the future -- this would effectively enable statistical error cancellation. The farther in the future from the date at which the knowledge is "written" to the DNA of some number of humans, the more people's DNA would be needed to eliminate the errors statistically. This would however in principle counteract mutations and enable the reliable recovery of messages in DNA even very far in the future.

The fact that it is in principle possible to encode knowledge into human (or other) DNA begs the question of whether there is already knowledge stored there? It's certainly worth a look! Maybe there is already a message there for us? One can only wonder if there is already an ancient "Wikipedia" of sorts already written there.

Interestingly enough, when certain statistical tests are run against human DNA,  it does seem to have properties that are indicative of written language, but only in the "junk" regions of the genome. Maybe it's not "junk" after all. Below is an article that discusses a recent discovery related to this:

Language in junk DNA

You've probably heard of a molecule called DNA, otherwise known as "The Blueprint Of Life". Molecular biologists have been examining and mapping the DNA for a few decades now. But as they've looked more closely at the DNA, they've been getting increasingly bothered by one inconvenient little fact - the fact that 97% of the DNA is junk, and it has no known use or function! But, an usual collaboration between molecular biologists, cryptoanalysists (people who break secret codes), linguists (people who study languages) and physicists, has found strange hints of a hidden language in this so- called "junk DNA".

Only about 3% of the DNA actually codes for amino acids, which in turn make proteins, and eventually, little babies. The remaining 97% of the DNA is, according to conventional wisdom, not gems, but junk.

The molecular biologists call this junk DNA, introns. Introns are like enormous commercial breaks or advertisements that interrupt the real program - except in the DNA, they take up 97% of the broadcast time. Introns are so important, that Richard Roberts and Phillip Sharp, who did much of the early work on introns back in 1977, won a Nobel Prize for their work in 1993. But even today, we still don't know what introns are really for.

Simon Shepherd, who lectures in cryptography and computer security at the University of Bradford in the United Kingdom, took an approach, that was based on his line of work. He looked on the junk DNA, as just another secret code to be broken. He analysed it, and he now reckons that one probable function of introns, is that they are some sort of error correction code - to fix up the occasional mistakes that happen as the DNA replicates itself. But even if he's right, introns could have lots of other uses.

The next big breakthrough came from a really unusual collaboration between medical doctors, physicists and linguists. They found even more evidence that there was a sort-of language buried in the introns.

According to the linguists, all human languages obey Zipf's Law. It's a really weird law, but it's not that hard to understand. Start off by getting a big fat book. Then, count the number of times each word appears in that book. You might find that the number one most popular word is "the" (which appears 2,000 times), followed by the second most popular word "a" (which appears 1,800 times), and so on. Right down at the bottom of the list, you have the least popular word, which might be "elephant", and which appears just once.

Set up two columns of numbers. One column is the order of popularity of the words, running from "1" for "the", and "2" for "a", right down "1,000" for "elephant". The other column counts how many times each word appeared, starting off with 2,000 appearances of "the", then 1,800 appearances of "a", down to one appearance of "elephant".

If you then plot on the right kind of graph paper, the order of popularity of the words, against the number of times each word appears you get a straight line! Even more amazingly, this straight line appears for every human language - whether it's English or Egyptian, Eskimo or Chinese! Now the DNA is just one continuous ladder of squillions of rungs, and is not neatly broken up into individual words (like a book).

So the scientists looked at a very long bit of DNA, and made artificial words by breaking up the DNA into "words" each 3 rungs long. And then they tried it again for "words" 4 rungs long, 5 rungs long, and so on up to 8 rungs long. They then analysed all these words, and to their surprise, they got the same sort of Zipf Law/straight-line-graph for the human DNA (which is mostly introns), as they did for the human languages!

There seems to be some sort of language buried in the so-called junk DNA! Certainly, the next few years will be a very good time to make a career change into the field of genetics.

So now, around the edge of the new millennium, we have a reasonable understanding of the 3% of the DNA that makes amino acids, proteins and babies. And the remaining 97% - well, we're pretty sure that there is some language buried there, even if we don't yet know what it says. It might say "It's all a joke", or it might say "Don't worry, be happy", or it might say "Have a nice day, lots of love, from your friendly local DNA".   (source)

Now to complete this thought: what if the information-carrying capacity of the so-called Junk DNA of the human genome is sufficient to hold the content of the Wikipedia? Then all we would need is some way of writing to it -- perhaps via gene therapy via infection by a virus that carries a copy of the Wikipedia.

This would enable volunteers to accept copies of the Wikipedia into their DNA and become vectors for the Wikipedia. They and their descendants would become walking encyclopedias and would preserve human knowledge for future generations. If only some people had this done then they and their lineages would be a sort of priesthood with particular importance for the future of humanity. It sounds like the basis for a really great science-fiction thriller!

By copying the Wikipedia into our own DNA we might be able to ensure that wherever human beings end up in the universe, the Wikipedia will go with them. Even if in some distant world humans destroy their civilization in a nuclear holocaust or are almost wiped out by an asteroid and have to rebuild from the stone-age again, they will eventually rediscover genomics and soon after that they will find the Wikipedia in their genome.

This is a kind of "backup strategy" for our civilization and all the knowledge we consider to be most important. Of course it is not clear yet whether the Junk DNA could carry enough information to encode the entire Wikipedia, nor is it clear that the Junk DNA is actually "junk" -- perhaps there is already something there that should not be overwritten? Or perhaps it serves some other purpose in human development and evolution that we shouldn't mess around with. It remains to be seen.

 

April 16, 2008

Great Article about Benefits of Twine from a Beta User

If you are interested in hearing about how some users are using the Twine invite-only beta test, here is a great article about why one user migrated to Twine from del.icio.us.

April 15, 2008

Cool Twine Fan Video by a High-School Student

I was pleasantly surprised to see a very nice fan video for Twine created by a high-school student who is in our beta test. It gives the flavor of Twine and is really nice.

April 12, 2008

A Few Predictions for the Near Future

This is a five minute video in which I was asked to make some predictions for the next decade about the Semantic Web, search and artificial intelligence. It was done at the NextWeb conference and was a fun interview.


Learning from the Future with Nova Spivack from Maarten on Vimeo.

April 01, 2008

Good Article on History of Talks Between Tibet and China

This article sheds some light on the history of attempts to find a resolution between the Dalai Lama and the Chinese government. I found it to be quite educational. There have in fact been numerous attempts to find a solution, but the process has been frozen in a deadlock for 50 years. The Chinese government has been the principle roadblock to this process -- they do not want to engage in high-level talks with the Dalai Lama's government. It would be easy to resolve this if serious, genuine high-level talks were to happen -- talks between the Dalai Lama and the Premier of China, for example. Until that happens, this situation will only get worse. It has to be resolved at the highest levels. The Dalai Lama has said he would be happy to engage in such talks. Why is the Chinese government not willing to participate?

March 29, 2008

Proof of Chinese Agent Provacateurs Dressing As Tibetan Monks?

You have to see this image.

March 28, 2008

Twine and Linked Data on the Semantic Web

Tim Berners-Lee just posted his thoughts about the importance of Linked Data on the Semantic Web. Linked data support is built-into Twine. All the data in Twine is accessible as open-standard RDF and OWL today and will be accessible to other applications via several API's including SPARQL. You can learn more about Twine's support for Linked Data and see some examples here.

Tim says:

In all this Semantic Web news, though, the proof of the pudding is in the eating. The benefit of the Semantic Web is that data may be re-used in ways unexpected by the original publisher. That is the value added. So when a Semantic Web start-up either feeds data to others who reuse it in interesting ways, or itself uses data produced by others, then we start to see the value of each bit increased through the network effect.

 

So if you are a VC funder or a journalist and some project is being sold to you as a Semantic Web project, ask how it gets extra re-use of data, by people who would not normally have access to it, or in ways for which it was not originally designed. Does it use standards? Is it available in RDF? Is there a SPARQL server?

Twine provides RDF and supports SPARQL (although while we are in beta we have not opened our SPARQL API yet, but we will...). At the same time Twine also protects privacy by only providing its data according to permissions. Apps can only get Twine data they permission to see such as their own data or their owner's or users's data, data that has been shared with them, or public data in Twine.

Twine is also designed to consume external Linked Data via it's APIs. Twine will be able to consume external RDF and OWL ontologies, as a means to enable other applications and users to extend its functionality and add new data to it.

March 26, 2008

My Visit to DERI -- World's Premier Semantic Web Research Institute

Earlier this month I had the opportunity to visit, and speak at, the Digital Enterprise Research Institute (DERI), located in Galway, Ireland. My hosts were Stefan Decker, the director of the lab, and John Breslin who is heading the SIOC project.

DERI has become the world's premier research institute for the Semantic Web. Everyone working in the field should know about them, and if you can, you should visit the lab to see what's happening there.

Part of the National University of Ireland, Galway. With over 100 researchers focused solely on the Semantic Web, and very significant financial backing, DERI has, to my knowledge, the highest concentration of Semantic Web expertise on the planet today. Needless to say, I was very impressed with what I saw there. Here is a brief synopsis of some of the projects that I was introduced to:

  • Semantic Web Search Engine (SWSE) and YARS, a massively scalable triplestore.  These projects are concerned with crawling and indexing the information on the Semantic Web so that end-users can find it. They have done good work on consolidating data and also on building a highly scalable triplestore architecture.
  • Sindice -- An API and search infrastructure for the Semantic Web. This project is focused on providing a rapid indexing API that apps can use to get their semantic content indexed, and that can also be used by apps to do semantic searches and retrieve semantic content from the rest of the Semantic Web. Sindice provides Web-scale semantic search capabilities to any semantic application or service.
  • SIOC -- Semantically Interlinked Online Communities. This is an ontology for linking and sharing data across online communities in an open manner, that is getting a lot of traction. SIOC is on its way to becoming a standard and may play a big role in enabling portability and interoperability of social Web data.
  • JeromeDL is developing technology for semantically enabled digital libraries. I was impressed with the powerful faceted navigation and search capabilities they demonstrated.
  • notitio.us. is a project for personal knowledge management of bookmarks and unstructured data.
  • SCOT, OpenTagging and Int.ere.st.  These projects are focused on making tags more interoperable, and for generating social networks and communities from tags. They provide a richer tag ontology and framework for representing, connecting and sharing tags across applications.
  • Semantic Web Services.  One of the big opportunities for the Semantic Web that is often overlooked by the media is Web services. Semantics can be used to describe Web services so they can find one another and connect, and even to compose and orchestrate transactions and other solutions across networks of Web services, using rules and reasoning capabilities. Think of this as dynamic semantic middleware, with reasoning built-in.
  • eLite. I was introduced to the eLite project, a large e-learning initiative that is applying the Semantic Web.
  • Nepomuk.  Nepomuk is a large effort supported by many big industry players. They are making a social semantic desktop and a set of developer tools and libraries for semantic applications that are being shipped in the Linux KDE distribution. This is a big step for the Semantic Web!
  • Semantic Reality. Last but not least, and perhaps one of the most eye-opening demos I saw at DERI, is the Semantic Reality project. They are using semantics to integrate sensors with the real world. They are creating an infrastructure that can scale to handle trillions of sensors eventually. Among other things I saw, you can ask things like "where are my keys?" and the system will search a network of sensors and show you a live image of your keys on the desk where you left them, and even give you a map showing the exact location. The service can also email you or phone you when things happen in the real world that you care about -- for example, if someone opens the door to your office, or a file cabinet, or your car, etc. Very groundbreaking research that could seed an entire new industry.

In summary, my visit to DERI was really eye-opening and impressive. I recommend that major organizations that want to really see the potential of the Semantic Web, and get involved on a research and development level, should consider a relationship with DERI -- they are clearly the leader in the space.

March 15, 2008

Disturbing Eyewitness Testimony from Lhasa

I got this picture from a friend who received it from a friend who is on the ground in Lhasa right now. Please redistribute, especially to the news media.

Dsc04166







Things are much worse for the Tibetans than has been reported. Can we help to avoid another Tainanmen?

March 14, 2008

First Week of Twine Beta Phase II Report

This week we began letting the second wave of beta users into the Twine invite-only beta. It's been a very busy and exciting time for the Twine team. I'll be providing more detailed stats on an ongoing basis in a few weeks once we have more data to analyze. For now, I will just provide some qualitative observations.

Twine is still in the early beta process, but already we are seeing a rapid increase in adoption and scale. We have only let in a few hundred more users to get the process started, but we will be letting more and more in every week as we go forward.

It has been really exciting to watch Twine grow. I find that I am increasingly glued to my Interest Feed watching the fascinating information that is flowing through from all the new members. There have been many new twines created around a wide and growing range of interests and large amount of content added. The recommendations are also quite interesting -- I have already discovered a wide range of new people, twines and content that I didn't know about.

As of this writing, I now have 157 social connections in Twine. My social network in Twine has doubled in size in a week and is rapidly approaching the size of my Facebook network. That's pretty impressive considering this happened in a week (it took about half a year for my Facebook network to grow to that size).

We also had our first outside Twine client app, called "Entwine," written spontaneously by a beta user -- it browses through the RDF data from various items in Twine. That was very cool and unexpected! It really got the team jazzed to see this happen.

Twine is now full of active discussions around interests, questions, ideas, suggestions, current events, technologies and products. I have been pleasantly surprised to see so much interaction among users develop so quickly. As we had hypothesized, discussions are turning out to be a very key feature.

We have received a lot of great feedback from beta users within Twine, as well as many suggestions for how to improve Twine, streamline the user experience, and integrate Twine with other applications and services. This is exactly what we had hoped for from our beta. The team is hard at work analyzing this and prioritizing our next development sprints in light of what we are learning from our users (we do minor releases every week and major ones every 3 weeks).

Most of the press reviews and user stories point to Twine being very exciting, useful and full of potential, which has been great to hear after so much work --- they also universally agree that we still have room to improve the user experience and we need to work on making Twine easier to learn and use. That's not unexpected -- we opened the beta well before the app is finished in order to understand user priorities better. We are really focusing on usability and bug fixes for the next several sprints.  All this feedback has been incredibly valuable to the team. Keep it coming!

Another interesting observation. The quality of the users in Twine is distinctly impressive. It's a very smart community of leading-edge thinkers, builders, and technology adopters. Kind of like having your own TED Conference, 24/7 around the world. We will be inviting in a wider range of users in later phases, once the app is further along. In the meantime it is really great to see so many of my colleagues in Twine, and to be making so many new contacts and friends here. For this initial phase this is exactly the audience we need -- people who will really roll up their sleeves and help us make Twine into a great application.

Twine is also rapidly aggregating most of the leading minds in the worldwide Semantic Web development and research community into a social and collaborative interest network. It is great to have this global community of people interested in building and using the Semantic Web come together in Twine, an application that is built using Semantic Web technologies on the Radar Networks Semantic Web Applications platform. I look forward to beginning to share Twine with this worldwide community, and to collaborate with others to extend it and integrate it with other semantic apps and data sets. This is definitely our goal.

It's been a great week. I haven't slept much. I'm having too much fun in Twine!

March 13, 2008

Twine Perspective on Yahoo Semantic Web Search Announcement

The Beginning of the Mainstream Semantic Web?

It is being reported that Yahoo will be indexing a wide array of structured metadata, including Semantic Web metadata. This will make Yahoo's search index potentially better than Google's, although it will also open their index up to sophisticated attempts to "game the system" as well that will need to be solved. But in any event, this will undoubtedly prod Google to begin indexing and making sense of structured metadata as well (actually, Google is already indexing FOAF, a Semantic Web metadata format).

I believe Yahoo's announcement marks the beginning of the mainstream Semantic Web. It should quickly catalyze an arms race by search engines, advertisers, and content providers to make the best use of semantic metadata on the Web. This will benefit the entire semantic sector and all players in it.

As they say, "a rising tide lifts all boats."

Where Twine Fits Into This Ecosystem

From the perspective of a company working on a large Semantic Web driven portal venture (Twine), and full platform for semantic applications (and search), this is good news. We'll be happy to open up Twine's content to Yahoo's index (when we go into General Availability in the summer timeframe, or maybe even sooner...). In addition, as more content providers add metadata to their content, it will make Twine's job of helping users collect, organize, share and discover interesting content, that much easier.

Where does Twine fit into the emerging Semantic Web ecosystem? Twine provides presence and content on the Semantic Web. It enables individuals and groups to homestead on the Semantic Web and get immediate value, without having to learn RDF.

Currently we are not going after the "be the search engine of the Semantic Web" opportunity -- we are focused on the "help users manage their information and connect with others who share their interests" and the "build thriving communities of interest" opportunities.

Our feeling is that incumbent search engines are probably best positioned to win the search engine of the entire Semantic Web war, when they decide to (as Yahoo just did, and Google most likely will soon decide to do as well...).

Twine is generating high-quality Semantic Web metadata about people, groups, topics of interest, and resources on the Web (Web pages, images, videos, books, products, documents, etc.). The metadata we are creating results from a combination of automated processing and user-contributions from our community.

The metadata Twine generates is then provided back to the users and community as open RDF that can be accessed and reused elsewhere. So we are effectively making a semantic graph of RDF about content around the Web, and related people, groups and their interests. Ultimately we become a semantic annotation layer above the Web. I can imagine that this is a dataset that Yahoo and Google and many others are going to want to be able to search.

The content in Twine is rapidly growing into a large semantic graph of information around people, groups and interests on the Web. We and our users are producing a large volume of high-quality original content and semantic metadata about existing Web content, that will undoubtedly make the Yahoo index much richer (and will drive traffic back to Twine and the sites we link back to from our graph).

The Semantic Web Eliminates Traditional Silos By Opening Up and Linking the Data

Twine is a hosted online service, but is not actually a "silo" in the traditional sense because all of our data is represented in open-standards-based RDF, and we are already providing access to that data on an experimental basis, and will provide even more via upcoming API's in the future.

This means that the data Twine is creating and gathering, is open, linked data, that can be reused in other applications and services. Ultimately this makes Twine a part of a growing distributed ecosystem. Semantic Web metadata in RDF and OWL is even better than microformats because it carries its own meaning about how to use it. Software that speaks RDF and OWL can instantly reuse it without any additional programming. To learn more about Twine's open RDF availability, see the Twine Tour: Semantic Web section.

I believe that the open-standards of the Semantic Web eliminate silos. Effectively all services that participate in using these standards and make their data open are becoming part of one big distributed worldwide database, rather than old fashioned silos. That's the benefit of open linked data services powered by RDF, OWL, SPARQL, and GRDDL.

How Will End-Users Participate in the Semantic Web?

If Yahoo and possibly Google make search better by indexing all sorts of metadata, there is then an even larger opportunity to help non-technical end-users create and use that metadata. This is where services like Twine fits in. End-users need ways to author, organize, share, reuse, and discover Semantic Web content.

We don't believe ordinary Webmasters or end-users are going to write microformats or RDF by hand. Even hard-core Semantic Web researchers don't do that. Ultimately end-users need user-friendly services that do this for them automatically, or at least make it easier to do. Twine helps these users to participate in the Semantic Web, without requiring them to have a degree in computer science. Twine provides an (increasingly) user-friendly hosted place where users can collect, organize, share and discover other interesting content around their interests, using the Semantic Web transparently "under the hood."

Concluding Thoughts

In short, Twine is where ordinary non-technical individuals and groups can join the Semantic Web, get a presence there, and start using it in useful ways, today. If Yahoo and Google become the search engines of the Semantic Web, that will make Twine even more necessary as the place where end-users can participate in this emerging ecosystem. We believe our community, and the rich the semantic graph we are growing will become increasingly valuable as the major search engines begin to index the Semantic Web.

But this is just the beginning of our story. Twine is designed to become a platform that others can build on and integrate with as well. There is more to our strategy than we have currently opened up about. In time we will be telling the rest of our story. We have some fun surprises in store in the future...

Reminder: Twine is a Beta -- A note about what Beta means

I want to remind everyone, TWINE IS A BETA. It is only a beta. Beta means not finished, under development, work in progress, construction site, imperfect, open to feedback, undergoing testing, getting better everyday, in need of more work, etc. and many other things that are not synonymous with "finished" or "ready for consumer launch." We know this. We never claimed otherwise. We opened Twine early to get feedback and let the community play around and give us feedback to guide our future work.

Some of the recent coverage of our project has seemingly misunderstood the meaning of the term "beta" or forgotten it, or simply expected a beta to be more of a finished application. Perhaps this is because many companies never come out of beta or use beta to mean "1.0, only cooler." In our case, beta really means Beta. We knew there were bugs and unfinished features, but we decided to open up anyway in order to get user feedback to guide our further work.

But even though Twine is a beta, it is already quite useful, and there is a large and thriving community in there sharing knowledge about interests including the Semantic Web, Web 3.0, Web 2.0, venture capital, politics, art, fashion, travel, cultures, religion, books, and many other interests.

In fact, the number of connections I have in Twine is rapidly approaching, and will probably soon surpass, the number of connections I have in Facebook. And in terms of use, we are finding that our users are visiting Twine many times a day and actively adding information, searching, and participating in discussions and debates there.

The hype around the Semantic Web (and even Twine) is in my opinion justified, but it will take time for that opinion to be obvious to everyone. In the meantime, I do think it has gotten a bit out of control. There is too much wild speculation and a general feeling that somehow the Semantic Web (or services like Twine) will solve every problem on the Internet. That won't be the case. However the Semantic Web and services like Twine that are built with it will  improve the content of the Web and enable applications to become smarter with less work.

To some degree the hype around the Semantic Web has set unrealistic expectations and it's not surprising that there is now some backlash. Some folks who came into Twine may have had impossible expectations -- perhaps thinking Twine would be some kind of a three-dimensional interface to all information, or a kind of Hal 9000 intelligent assistant. I'm sorry to disappoint them. Twine is much more pragmatic and focused on things like organizing, sharing and discovering information around interests. It is also just a first step in a long development path in which much more will be added in the future. And let's not forget... Twine is in Beta. It's not finished yet.

I think the backlash is good actually -- it will reset expectations to realistic levels. Hopefully then folks can focus on what the Semantic Web (and Twine) do today, rather than what they imagine they might do in 20 years, or what they don't do yet.

In the case of Twine, it is not a panacea, but it is certainly well on its way to becoming a leading semantically-driven online service with some interesting opportunities in the marketplace. There is certainly a lot more in the application than can be discovered in 7 minutes of using it and I can understand how that might be frustrating to reviewers who have little time and high expectations of a finished consumer app. That is something we are working on and when we eventually move out of beta, it is something we will be able to say we have solved it.

Meanwhile, Twine is a beta and while there is already a LOT there, we can, must, and will be doing much, much more to address usability and finish features that are still under development and imperfect.