Video of my Talk at Digital Now
This is a video of my talk at the Digital Now conference in Orlando yesterday. There's a long intro by Don Dea, and then I speak (starting at index 05:14) about the Semantic Web and Twine.
This is a video of my talk at the Digital Now conference in Orlando yesterday. There's a long intro by Don Dea, and then I speak (starting at index 05:14) about the Semantic Web and Twine.
I highly recommend this new book on Collective Intelligence. It features chapters by a Who's Who of thinkers on Collective Intelligence, including a chapter by me about "Harnessing the Collective Intelligence of the World Wide Web."
Here is the full-text of my chapter, minus illustrations (the rest of the book is great and I suggest you buy it to have on your shelf. It's a big volume and worth the read):
Harnessing the
collective intelligence
of the
World-Wide Web
Nova Spivack[1]
Introduction
We are about to enter the third decade of the Web, sometimes referred to as “Web 3.0.” During this decade, the Web will evolve from a globally distributed fileserver into a globally distributed database. This shift will be enabled by a set of emerging technologies called The Semantic Web, which add a new layer of machine-understandable metadata about the meaning of information to the content of the Web.
The Semantic Web will catalyze a new era in collective intelligence. Individuals, groups, organizations and communities will be able to create, connect, find and share knowledge more intelligently and productively than ever before. Ultimately it will enable the Web itself, and all the people and applications that participate in it, to become more collectively intelligent.
Web 3.0—The Third Decade of the Web
The third-decade of the Web, “Web 3.0,” begins officially in 2010, but we are already entering the early stages of this transition today. To understand where the Web is headed it helps to zoom out to a larger historical context.
The final decade of the PC-era (1980—1990) was largely concerned with innovation on the front-end of the personal computer: the desktop and user interface layer of the PC. The focus of this period was in making PC’s easier to use with innovations such as Microsoft Windows, the Macintosh user-interface, and more consistent user-interfaces and integration across applications.
The first decade of the Web-era (“Web 1.0”
from 1990 - 2000), was focused on the back-end of the Web: the core
technologies and platforms of the Web such as HTML, HTTP, Web servers, search
engines, commerce technologies, advertising technologies, and the basic
architectures and business model of Web applications. This decade was mainly
focused on the technology and infrastructure of the Web and most of the actual
innovation dollars were spent on making things that only software developers
could see.
In contrast, the
second decade of the Web (“Web 2.0” from 2000—2010) has been largely focused on
the front-end of the Web. Much of the innovation has not been on actual
technology but rather on design patterns and user-interfaces for improving the
end-user experience of the Web. During this decade we have focused on paradigms
such as AJAX
Another big focus of Web 2.0 has been user-generated content, and in particular the practice of “tagging” content with subject tags. Tagging has in turn led to the concept of “folksonomies” in which taxonomies that organize data are evolved in a bottom-up fashion by a decentralized community of users.
The coming third-decade of the Web (“Web 3.0” from 2010—2020) will shift the emphasis back to the back-end of the Web. This decade will be largely focused on upgrading the technical infrastructure and content of the Web, based on emerging technologies such as the Semantic Web. During this decade the primary push will be enriching the Web so that it can function more like a database.
Today the Web is composed mainly of unstructured and semistructured data such as text files and Web pages. Keyword search engines are able to provide rudimentary search capabilities over this information, but only for the most simplistic queries. Compare current Web search to the more precise capabilities of queries against a database and the difference is immediately clear. The Web does not provide anything close to the search capabilities or precision of a database today. But that is about to change.
The Semantic Web provides a way to enrich both unstructured and structured data so that it can be queried with the precision of a database. Essentially, it provides a way to tag any information with metadata that explains what it means—and this metadata can be understood by software applications, such as search engines or knowledge management applications. It’s important to note that The Semantic Web is not a new Web, it’s just a new layer of the Web we already have. The semantic metadata that comprises the knowledge of the Semantic Web won’t live in some new place—it lives right in the existing documents and data on the Web. The knowledge of the Semantic Web is encoded using special new markup languages such as RDF and OWL.
This metadata is invisible to users (it doesn’t appear in Web browsers) but behind the scenes it can be read by any application that is compatible with these markup languages. So when any application, such as a next-generation search engine, sees a Web page or data record that contains RDF or OWL metadata, it can then use that metadata to understand what that page or data record means, is about, what it is related to, and how to interpret it. With Semantic Web metadata in place, searches on the Web will be as, or even more, precise as those in any database. But that is just the beginning of what the Semantic Web enables. Beyond merely improving search, the Semantic Web actually transforms the Web into a database—a worldwide database in which data records can be moved around, shared, and linked together in new ways.
On the basis of the technologies of The Semantic Web and the Web 3.0 era, we will then be able to enter the fourth decade of the Web (“Web 4.0”—2020—2030) in which the shift will turn back to the front-end of the Web. The Semantic Web doesn’t just add metadata about the meaning of information to the Web, it also enables metadata to be added about relationships, conceptual linkages, logical connections, and even logical rules. On the basis of this additional metadata, Web users and other applications will be able to harness the power of intelligent agents that will search the Web for things that interest them, make suggestions and recommendations, and even potentially transact on their behalf. This will open the door to a new kind of user-interface to the Web that is smarter and more conversational in nature, in which users will enter into dialogues with agents and interact with them search the Web and make decisions. A conversational interface to the Web will be more appropriate in the increasingly mobile world, when users will mostly interact with the Web from small portable mobile or embedded devices.
Users on mobile devices that have little to no screen real-estate will need a more productive way to interact with the Web than through a miniature browser; nobody likes sorting through pages of Google results on a cell phone. Instead, they will want to simply ask a question (perhaps through a voice interface, rather than typing with their thumbs) and have a virtual intelligent assistant dispatch agents to find the best answers and then report back to them with results or to ask further questions or for a decision.
Smart, interactive conversational interfaces and intelligent agent-based virtual assistants are possible today, but only in narrow domains. In the Web 4.0 era they may in fact be our primary way of interacting with the whole Web and may be built into the user interface of most search engines, personal email providers, and leading Websites.
The Virtualization of Knowledge and Intelligence
In the
long-term, the Semantic Web provides a way to move much of the “intelligence”
that currently resides in the minds of individuals, groups and organizations,
and/or that is hard-coded into various software and Web applications, out onto
the Web itself. It provides a way to virtualize knowledge and intelligence in
an explicitly machine-readable, universally accessible form. In other words, it
provides a way to start making the Web “smarter.”
Knowledge and expertise that previously only existed in people’s heads, or had to be painstakingly coded into each particular vertical software application, will be represented in a form of universally readable metadata on the Web—just like HTML documents today. In other words, using the Semantic Web you can publish knowledge and even the underlying conceptual frameworks, rules and heuristics that embody domain expertise, on the Web in an abstract, machine-readable form.
There are many benefits that stem from this. For one thing, it will make it much easier to write smart software applications because much of the necessary “smarts” will not reside in the applications at all, but will rather live out there on the Web.
For example, to write an application that can intelligently assist with travel logistics, a developer will simply be able to point it at existing sets of knowledge and rules that exist for the travel domain on the Web already. The application will be able to draw on those pools of existing domain-knowledge without having to be specifically programmed to do so, because it understands the underlying standards of the Semantic Web. Similarly, the same application could just as easily help someone trade on the stock market, by simply pointing to domain knowledge on Semantic Web about finance and investment.
As more pools of domain knowledge are added to the Web around various verticals, all applications will potentially benefit. This sets up a kind of network effect in which a global knowledge commons begins to form and self-amplify over time. For example, first the travel domain is added to the Semantic Web. Then someone else adds domain knowledge about geography and links them together. Another group then adds domain knowledge about hotels, and another one adds domain knowledge about weather—and these all connect to each other in various ways.
With all of this interconnected knowledge on the Web in machine-readable form, application developers can then more easily and quickly write applications that understand concepts and rules related to booking travel reservations, and that can cross-reference reservation information with knowledge about geographic places, relevant weather, and hotels in those locations. And in the other direction, someone booking a hotel can then find information about relevant weather and book travel to get to that hotel. This is just one example. There are an infinite range of other possibilities for these technologies across all domains.
The key point of all this is that The Semantic Web enables applications to become thinner, yet at the same time smarter, by drawing on the collective intelligence embodied by the Web itself. It will become possible to write applications that understand one or more specialized vertical domains faster, and ultimately applications will become more general—they will be able to dynamically load in specialized domain knowledge for whatever domain is needed, without having to be specifically programmed or limited to just those domains.
Application developers will be able to draw on the knowledge added to the Web by others, instead of having to reinvent the wheel by programming all that knowledge directly into their applications every time. And in turn, the knowledge that their applications create can, if they want to allow it, be published back onto the Web for other applications to draw on as well.
Semantic Web as The Next Leap in Human Collective
Intelligence
Looking at the evolution of the Semantic Web in historical context, we can view it as the next big step in a longer process of the evolution of human collective intelligence.
Before the invention of written language, knowledge could only be communicated verbally and was handed down through oral traditions. During this period, one had to be in immediate physical proximity of someone who had certain knowledge in order to receive it from them. This meant that the maximum effective range of human collective intelligence was quite short in space and time.
With the invention of writing, and eventually printing, humanity was able to process knowledge over longer distances in space and time, and with less reliance on particular individuals. People could now engage in dialogues and dialectics with larger groups of people in more places, across larger distances in space, and with more precision over larger ranges of time.
The printing press took this to a new level by starting the process of mass-distribution of knowledge, but it still relied on an expensive physical manufacturing process and a paper medium that was perishable and costly to store and move around.
With advent of electronic communications of various forms, humanity achieved many milestones—the transmission of knowledge could take place at the speed of light, and using digital storage media we were freed from the limitations of the paper medium.
The Internet and the Web transformed the process of distributing knowledge even further—enabling a global knowledge commons to emerge. The Internet and Web enable anyone and everyone to become providers of knowledge, not just consumers—a fundamental shift in the way that knowledge transmission and media function. They are not just about the mass-distribution and mass-consumption of knowledge; they enable the mass-creation of knowledge. In some respects these technologies are analogues of the printing press in that they have democratized the process of creating, sharing and accessing knowledge by fundamentally changing the economics of the entire process—making it affordable and accessible to all.
But even on the Web, for all its many benefits, knowledge is still not free from the limitations of the human brain. Only humans can really understand the knowledge that is represented in Web sites and databases, for example. While all other processes related to the distribution, storage and access to knowledge can now be done digitally, using software and the Web, the processes of creating, consuming and actually understanding knowledge are still limited only to living humans. That’s where the Semantic Web comes in.
Liberating Knowledge and Intelligence from Human
Brains
The Semantic Web virtualizes human knowledge and expertise outside of human brains, and even outside of any particular software application—knowledge becomes essentially just more data on the Web. When we speak of knowledge here we don’t just mean information—the first-order raw data that is currently on the Web—we mean the actual meaning and interpretation of the information that is not on the Web but rather exists only in human brains.
The Semantic Web provides a way to make the meaning and interpretation of information explicit in a form that is unambiguous and publishable, and shareable, on the Web. This will make all this knowledge understandable by software. It’s almost like the invention of a new language—a sort of meta-language for formally expressing what exactly you mean when you say something. The impact of this could be enormous.
For the first time in human history, we won’t have to rely only on humans to create, understand and consume knowledge—our machines will be able to help us do this. They will help us work, collaborate, create, explore, monitor, discover, search, innovate, connect, and synthesize. This will open the door to an almost unimaginable amplification of the human mind, and human collective intelligence on this planet. At first the impact of this will largely be focused around assisting humans with simple clerical and research tasks, but the process will inevitably continue to evolve to a point where software will begin to originate new knowledge for us, advise us, and eventually to even start making certain types of decisions on our behalf.
Although the Semantic Web has barely moved from the lab to the mainstream Internet, it is in fact much farther along than most people realize. Today there are already semantic applications under development that can organize all your information automatically, make recommendations based on your dynamically changing interests, identify new connections between ideas or documents in different places, make logical inferences or discover contradictions, and even make discoveries by doing proofs and explorations based on available data.
Within a few years these capabilities will begin to filter out to the mainstream users of the Internet, and with a decade or two at most, they will become commonplace. There are only a few billion humans today, and each of us can only cope with a small amount of information and relationships before we become overloaded. But in an era of machine understanding of human knowledge we may potentially be able to leverage thousands to millions of software agents to help us. This will vastly increase our ability to cope with masses of information and relationships productively. In an increasingly complex, distributed, and rapidly changing world, we simply will not be able to cope in the future without help. The Semantic Web provides one path to solving these problems, enabling us to remain productive in the future.
Amplifying Human Collective Intelligence
The Semantic
Web does not replace humans or take them out of the equation. It simply reduces
the load on humans, freeing them from some of the pain of information overload,
and providing a new path for software to begin to augment and even amplify
human collective intelligence.
Today there are several barriers to human collective intelligence that arise from basic limitations of the human brain. Human individuals, and groups of humans, simply cannot process or share knowledge effectively beyond a certain level of information or relationship complexity and change. For this reason, collaboration and collective intelligence are often easier to achieve and yield better results in small groups than large groups.
As group size increases, productive collective intelligence becomes dramatically harder to achieve. Thus, ironically even though larger groups offer the potential for exponential increases in collective intelligence, in practice the opposite is usually the result: the larger teams get, the dumber they get. An entire industry of management consultants and facilitators exists because of these inefficiencies.
The Semantic Web may be able to help with this age-old problem. By enabling software to understand information and relationships, we may be able to begin to automatically and intelligently facilitate interpersonal and group collaboration and knowledge management, and this may finally enable larger groups to become exponentially smarter instead of dumber.
Twine.com—A New Service for Collective Intelligence
My own company,
Radar Networks, has recently introduced a new service based on the Semantic
Web, called Twine (www.twine.com) that
focuses on amplifying human collective intelligence. Twine helps individuals
and groups manage and share knowledge more productively, using the Semantic
Web.
As people use Twine it learns from them and automatically organizes and connects their information with other related information, saving them valuable time and enabling them to discover connected knowledge. Twine provides individuals and groups with a smart virtual environment for their knowledge.
Twine works with all kinds of knowledge—email, RSS, Web pages, documents, photos, videos, audio, contact records, or anything else. Regardless of where information actually resides, Twine enables users to view it as if it were in one place, and to see how it is connected and organized. Twine also automatically helps to make sense of information and to make it more easily searchable.
Twine is a Web-based online service that is completely built using the Semantic Web. Although it is only in early beta-testing at the time of this writing, it is already demonstrating that intelligent machine-augmentation of individual and group knowledge management is possible and improves productivity and collaboration.
As Twine unfolds and spreads to more individuals, groups and teams, and organizations and communities, it has the potential to become a new backbone for collective intelligence and knowledge sharing worldwide. At least that is the vision of the project. Time will tell whether we succeed it.
From Global Knowledge Commons to Global Brain
If the Semantic
Web develops as predicted, it is possible that within 20 years much, if not
all, human knowledge will be represented on the Web in machine-understandable
form. We have seen the beginnings of this trend with services such as the
Wikipedia. More recently, another initiative called the DBpedia is creating a
Semantic Web version of the Wikipedia. But this is just the start of this
trend.
As more and more applications and services start producing Semantic Web metadata and exposing it back to other applications and services on the Web, we will begin to create a new global knowledge commons. At first these different services will function like islands of knowledge, but then they will begin to interconnect.
A piece of knowledge in one place will link to and from pieces of knowledge in other places. Eventually this will become a giant associative network, not so unlike the brain, but on a global scale. And as people and applications surf through its connections and consume its knowledge, adding new knowledge and connections back to it as they do, it will change and self-organize dynamically. Just as the first generations of the Web have enabled a global medium for “hypertext,” the Semantic Web will enable a global medium for “hyperdata.”
As one projects the future evolution of the Web and the emerging Semantic Web, one cannot help but notice certain similarities to the human mind. Some have even ventured to call this the beginning of an emerging “Global Brain.” It is too early to tell how similar it will truly be to the actual human brain. However we can already predict with confidence that it will a system that collectively will be capable of at least rudimentary learning, memory, perception, planning and reasoning.
The human brain is a massively parallel collective intelligence engine in which billions of neurons interact across trillions of connections to process and generate knowledge.
Similarly, the collective intelligence of the Web will involve the combined interactions and intelligence of billions of humans and machines across trillions of relationships. These processes will not be guided centrally, and the system will most likely not be centralized around a single construct of a “self” nor will it have anything like a human body.
While it will be possible to say the system as a whole is intelligent, it will be difficult to locate any particular source of that intelligence; the intelligence will come from everywhere: from the humans, the software and even the data and links that comprise the Web.
Because the Web is quite different from the human brain, it is likely that its intelligence will be different from what we think of as human intelligence today. But it will nonetheless be intelligent—in a massively distributed, emergent, and chaotic way that we humans may not be able to even comprehend. The “thoughts” the Web will think may be just too vast and complex for us to even recognize, let alone imagine or understand. Yet perhaps in decade-long time-scales at least, we will begin to be able to see the outlines of its thinking.
[1] Nova Spivack is the CEO and founder of Radar Networks, a San-Francisco company that is pioneering applications of the Semantic Web for distributed collaboration and knowledge management with a new service called Twine.com. Mr. Spivack is a recognized authority on the Semantic Web and future of the Web, which is sometimes called “Web 3.0.” A more detailed bio can be found at his company website: http://www.radarnetworks.com/about/management.html#nova.
I had an interesting thought today about the long-term preservation and transmission of human knowledge.
The Wikipedia may be on its way to becoming the one of the best places in which to preserve knowledge for future generations. But this is just the beginning. What if we could encode the Wikipedia into the Junk DNA portion of our own genome? It appears that something like this may actually be possible -- at least according some recent studies of the non-coding regions of the human genome.
If we could actually encode knowledge, like the Wikipedia for example, into our genome, the next logical step would be to find a way to access it directly.
At first we might only be able to access and read the knowledge stored in our DNA through a computationally intensive genetic analysis of an individual's DNA. In order to correct any errors in the data from mutuation, we would also need to cross-reference this individual data with similar analyses from the DNA of other people who also carry this data in their DNA. But this is just the beginning. There are however ways to stored data such that there is enough redundancy to protect against degradation. Assuming we could do this we might be able to eliminate the need for cross referencing as a form of error correction -- the data itself would be self-correcting so to speak. If we could accomplish this then the next step would be to find a way for an individual to access the knowledge stored in their DNA in real-time, directly. That's a long way off but there may be a way to do this using some future nano-scale genomic-brain interface. This opens up some fascinating areas of speculation to say the least.
Why The Wikipedia?
The Wikipedia has certain qualities that make it better than other forms of knowledge preservation and transmission:
What this means is that if you have any knowledge that you want to preserve for future generations, a good place to put it is in the Wikipedia. Putting it there almost guarantees that it will propagate around the world and throughout the human-explored universe (in the future, if we become a spacefaring civilization), and into the distant future of human civilizations.
The Potential For Storing Knowledge in DNA
Is it possible to store knowledge -- such as the Wikipedia -- in human DNA? It would certainly be useful if we could do this. By storing knowledge in human DNA of living humans, or of common bacteria for that matter, it could then potentially be passed down and spread through generations into the far future. However the mutability of DNA over time might gradually introduce errors that would degrade the information within particular lines of DNA over long periods of time.
Perhaps this could however be mitigated by comparing DNA samples from a large cross-section of individuals within the population of descendants of original holders of DNA-knowledge-archives in the future -- this would effectively enable statistical error cancellation. The farther in the future from the date at which the knowledge is "written" to the DNA of some number of humans, the more people's DNA would be needed to eliminate the errors statistically. This would however in principle counteract mutations and enable the reliable recovery of messages in DNA even very far in the future.
The fact that it is in principle possible to encode knowledge into human (or other) DNA begs the question of whether there is already knowledge stored there? It's certainly worth a look! Maybe there is already a message there for us? One can only wonder if there is already an ancient "Wikipedia" of sorts already written there.
Interestingly enough, when certain statistical tests are run against human DNA, it does seem to have properties that are indicative of written language, but only in the "junk" regions of the genome. Maybe it's not "junk" after all. Below is an article that discusses a recent discovery related to this:
Language in junk DNA
You've probably heard of a molecule called DNA, otherwise known as "The Blueprint Of Life". Molecular biologists have been examining and mapping the DNA for a few decades now. But as they've looked more closely at the DNA, they've been getting increasingly bothered by one inconvenient little fact - the fact that 97% of the DNA is junk, and it has no known use or function! But, an usual collaboration between molecular biologists, cryptoanalysists (people who break secret codes), linguists (people who study languages) and physicists, has found strange hints of a hidden language in this so- called "junk DNA".
Only about 3% of the DNA actually codes for amino acids, which in turn make proteins, and eventually, little babies. The remaining 97% of the DNA is, according to conventional wisdom, not gems, but junk.
The molecular biologists call this junk DNA, introns. Introns are like enormous commercial breaks or advertisements that interrupt the real program - except in the DNA, they take up 97% of the broadcast time. Introns are so important, that Richard Roberts and Phillip Sharp, who did much of the early work on introns back in 1977, won a Nobel Prize for their work in 1993. But even today, we still don't know what introns are really for.
Simon Shepherd, who lectures in cryptography and computer security at the University of Bradford in the United Kingdom, took an approach, that was based on his line of work. He looked on the junk DNA, as just another secret code to be broken. He analysed it, and he now reckons that one probable function of introns, is that they are some sort of error correction code - to fix up the occasional mistakes that happen as the DNA replicates itself. But even if he's right, introns could have lots of other uses.
The next big breakthrough came from a really unusual collaboration between medical doctors, physicists and linguists. They found even more evidence that there was a sort-of language buried in the introns.
According to the linguists, all human languages obey Zipf's Law. It's a really weird law, but it's not that hard to understand. Start off by getting a big fat book. Then, count the number of times each word appears in that book. You might find that the number one most popular word is "the" (which appears 2,000 times), followed by the second most popular word "a" (which appears 1,800 times), and so on. Right down at the bottom of the list, you have the least popular word, which might be "elephant", and which appears just once.
Set up two columns of numbers. One column is the order of popularity of the words, running from "1" for "the", and "2" for "a", right down "1,000" for "elephant". The other column counts how many times each word appeared, starting off with 2,000 appearances of "the", then 1,800 appearances of "a", down to one appearance of "elephant".
If you then plot on the right kind of graph paper, the order of popularity of the words, against the number of times each word appears you get a straight line! Even more amazingly, this straight line appears for every human language - whether it's English or Egyptian, Eskimo or Chinese! Now the DNA is just one continuous ladder of squillions of rungs, and is not neatly broken up into individual words (like a book).
So the scientists looked at a very long bit of DNA, and made artificial words by breaking up the DNA into "words" each 3 rungs long. And then they tried it again for "words" 4 rungs long, 5 rungs long, and so on up to 8 rungs long. They then analysed all these words, and to their surprise, they got the same sort of Zipf Law/straight-line-graph for the human DNA (which is mostly introns), as they did for the human languages!
There seems to be some sort of language buried in the so-called junk DNA! Certainly, the next few years will be a very good time to make a career change into the field of genetics.
So now, around the edge of the new millennium, we have a reasonable understanding of the 3% of the DNA that makes amino acids, proteins and babies. And the remaining 97% - well, we're pretty sure that there is some language buried there, even if we don't yet know what it says. It might say "It's all a joke", or it might say "Don't worry, be happy", or it might say "Have a nice day, lots of love, from your friendly local DNA". (source)
Now to complete this thought: what if the information-carrying capacity of the so-called Junk DNA of the human genome is sufficient to hold the content of the Wikipedia? Then all we would need is some way of writing to it -- perhaps via gene therapy via infection by a virus that carries a copy of the Wikipedia.
This would enable volunteers to accept copies of the Wikipedia into their DNA and become vectors for the Wikipedia. They and their descendants would become walking encyclopedias and would preserve human knowledge for future generations. If only some people had this done then they and their lineages would be a sort of priesthood with particular importance for the future of humanity. It sounds like the basis for a really great science-fiction thriller!
By copying the Wikipedia into our own DNA we might be able to ensure that wherever human beings end up in the universe, the Wikipedia will go with them. Even if in some distant world humans destroy their civilization in a nuclear holocaust or are almost wiped out by an asteroid and have to rebuild from the stone-age again, they will eventually rediscover genomics and soon after that they will find the Wikipedia in their genome.
This is a kind of "backup strategy" for our civilization and all the knowledge we consider to be most important. Of course it is not clear yet whether the Junk DNA could carry enough information to encode the entire Wikipedia, nor is it clear that the Junk DNA is actually "junk" -- perhaps there is already something there that should not be overwritten? Or perhaps it serves some other purpose in human development and evolution that we shouldn't mess around with. It remains to be seen.
If you are interested in hearing about how some users are using the Twine invite-only beta test, here is a great article about why one user migrated to Twine from del.icio.us.
I was pleasantly surprised to see a very nice fan video for Twine created by a high-school student who is in our beta test. It gives the flavor of Twine and is really nice.
This is a five minute video in which I was asked to make some predictions for the next decade about the Semantic Web, search and artificial intelligence. It was done at the NextWeb conference and was a fun interview.
Learning from the Future with Nova Spivack from Maarten on Vimeo.
This article sheds some light on the history of attempts to find a resolution between the Dalai Lama and the Chinese government. I found it to be quite educational. There have in fact been numerous attempts to find a solution, but the process has been frozen in a deadlock for 50 years. The Chinese government has been the principle roadblock to this process -- they do not want to engage in high-level talks with the Dalai Lama's government. It would be easy to resolve this if serious, genuine high-level talks were to happen -- talks between the Dalai Lama and the Premier of China, for example. Until that happens, this situation will only get worse. It has to be resolved at the highest levels. The Dalai Lama has said he would be happy to engage in such talks. Why is the Chinese government not willing to participate?
You have to see this image.
Tim Berners-Lee just posted his thoughts about the importance of Linked Data on the Semantic Web. Linked data support is built-into Twine. All the data in Twine is accessible as open-standard RDF and OWL today and will be accessible to other applications via several API's including SPARQL. You can learn more about Twine's support for Linked Data and see some examples here.
Tim says:
In all this Semantic Web news, though, the proof of the pudding is in the eating. The benefit of the Semantic Web is that data may be re-used in ways unexpected by the original publisher. That is the value added. So when a Semantic Web start-up either feeds data to others who reuse it in interesting ways, or itself uses data produced by others, then we start to see the value of each bit increased through the network effect.
So if you are a VC funder or a journalist and some project is being sold to you as a Semantic Web project, ask how it gets extra re-use of data, by people who would not normally have access to it, or in ways for which it was not originally designed. Does it use standards? Is it available in RDF? Is there a SPARQL server?
Twine provides RDF and supports SPARQL (although while we are in beta we have not opened our SPARQL API yet, but we will...). At the same time Twine also protects privacy by only providing its data according to permissions. Apps can only get Twine data they permission to see such as their own data or their owner's or users's data, data that has been shared with them, or public data in Twine.
Twine is also designed to consume external Linked Data via it's APIs. Twine will be able to consume external RDF and OWL ontologies, as a means to enable other applications and users to extend its functionality and add new data to it.
Earlier this month I had the opportunity to visit, and speak at, the Digital Enterprise Research Institute (DERI), located in Galway, Ireland. My hosts were Stefan Decker, the director of the lab, and John Breslin who is heading the SIOC project.
DERI has become the world's premier research institute for the Semantic Web. Everyone working in the field should know about them, and if you can, you should visit the lab to see what's happening there.
Part of the National University of Ireland, Galway. With over 100 researchers focused solely on the Semantic Web, and very significant financial backing, DERI has, to my knowledge, the highest concentration of Semantic Web expertise on the planet today. Needless to say, I was very impressed with what I saw there. Here is a brief synopsis of some of the projects that I was introduced to:
In summary, my visit to DERI was really eye-opening and impressive. I recommend that major organizations that want to really see the potential of the Semantic Web, and get involved on a research and development level, should consider a relationship with DERI -- they are clearly the leader in the space.
This week we began letting the second wave of beta users into the Twine invite-only beta. It's been a very busy and exciting time for the Twine team. I'll be providing more detailed stats on an ongoing basis in a few weeks once we have more data to analyze. For now, I will just provide some qualitative observations.
Twine is still in the early beta process, but already we are seeing a rapid increase in adoption and scale. We have only let in a few hundred more users to get the process started, but we will be letting more and more in every week as we go forward.
It has been really exciting to watch Twine grow. I find that I am increasingly glued to my Interest Feed watching the fascinating information that is flowing through from all the new members. There have been many new twines created around a wide and growing range of interests and large amount of content added. The recommendations are also quite interesting -- I have already discovered a wide range of new people, twines and content that I didn't know about.
As of this writing, I now have 157 social connections in Twine. My social network in Twine has doubled in size in a week and is rapidly approaching the size of my Facebook network. That's pretty impressive considering this happened in a week (it took about half a year for my Facebook network to grow to that size).
We also had our first outside Twine client app, called "Entwine," written spontaneously by a beta user -- it browses through the RDF data from various items in Twine. That was very cool and unexpected! It really got the team jazzed to see this happen.
Twine is now full of active discussions around interests, questions, ideas, suggestions, current events, technologies and products. I have been pleasantly surprised to see so much interaction among users develop so quickly. As we had hypothesized, discussions are turning out to be a very key feature.
We have received a lot of great feedback from beta users within Twine, as well as many suggestions for how to improve Twine, streamline the user experience, and integrate Twine with other applications and services. This is exactly what we had hoped for from our beta. The team is hard at work analyzing this and prioritizing our next development sprints in light of what we are learning from our users (we do minor releases every week and major ones every 3 weeks).
Most of the press reviews and user stories point to Twine being very exciting, useful and full of potential, which has been great to hear after so much work --- they also universally agree that we still have room to improve the user experience and we need to work on making Twine easier to learn and use. That's not unexpected -- we opened the beta well before the app is finished in order to understand user priorities better. We are really focusing on usability and bug fixes for the next several sprints. All this feedback has been incredibly valuable to the team. Keep it coming!
Another interesting observation. The quality of the users in Twine is distinctly impressive. It's a very smart community of leading-edge thinkers, builders, and technology adopters. Kind of like having your own TED Conference, 24/7 around the world. We will be inviting in a wider range of users in later phases, once the app is further along. In the meantime it is really great to see so many of my colleagues in Twine, and to be making so many new contacts and friends here. For this initial phase this is exactly the audience we need -- people who will really roll up their sleeves and help us make Twine into a great application.
Twine is also rapidly aggregating most of the leading minds in the worldwide Semantic Web development and research community into a social and collaborative interest network. It is great to have this global community of people interested in building and using the Semantic Web come together in Twine, an application that is built using Semantic Web technologies on the Radar Networks Semantic Web Applications platform. I look forward to beginning to share Twine with this worldwide community, and to collaborate with others to extend it and integrate it with other semantic apps and data sets. This is definitely our goal.
It's been a great week. I haven't slept much. I'm having too much fun in Twine!
The Beginning of the Mainstream Semantic Web?
It is being reported that Yahoo will be indexing a wide array of structured metadata, including Semantic Web metadata. This will make Yahoo's search index potentially better than Google's, although it will also open their index up to sophisticated attempts to "game the system" as well that will need to be solved. But in any event, this will undoubtedly prod Google to begin indexing and making sense of structured metadata as well (actually, Google is already indexing FOAF, a Semantic Web metadata format).
I believe Yahoo's announcement marks the beginning of the mainstream Semantic Web. It should quickly catalyze an arms race by search engines, advertisers, and content providers to make the best use of semantic metadata on the Web. This will benefit the entire semantic sector and all players in it.
As they say, "a rising tide lifts all boats."
Where Twine Fits Into This Ecosystem
From the perspective of a company working on a large Semantic Web driven portal venture (Twine), and full platform for semantic applications (and search), this is good news. We'll be happy to open up Twine's content to Yahoo's index (when we go into General Availability in the summer timeframe, or maybe even sooner...). In addition, as more content providers add metadata to their content, it will make Twine's job of helping users collect, organize, share and discover interesting content, that much easier.
Where does Twine fit into the emerging Semantic Web ecosystem? Twine provides presence and content on the Semantic Web. It enables individuals and groups to homestead on the Semantic Web and get immediate value, without having to learn RDF.
Currently we are not going after the "be the search engine of the Semantic Web" opportunity -- we are focused on the "help users manage their information and connect with others who share their interests" and the "build thriving communities of interest" opportunities.
Our feeling is that incumbent search engines are probably best positioned to win the search engine of the entire Semantic Web war, when they decide to (as Yahoo just did, and Google most likely will soon decide to do as well...).
Twine is generating high-quality Semantic Web metadata about people, groups, topics of interest, and resources on the Web (Web pages, images, videos, books, products, documents, etc.). The metadata we are creating results from a combination of automated processing and user-contributions from our community.
The metadata Twine generates is then provided back to the users and community as open RDF that can be accessed and reused elsewhere. So we are effectively making a semantic graph of RDF about content around the Web, and related people, groups and their interests. Ultimately we become a semantic annotation layer above the Web. I can imagine that this is a dataset that Yahoo and Google and many others are going to want to be able to search.
The content in Twine is rapidly growing into a large semantic graph of information around people, groups and interests on the Web. We and our users are producing a large volume of high-quality original content and semantic metadata about existing Web content, that will undoubtedly make the Yahoo index much richer (and will drive traffic back to Twine and the sites we link back to from our graph).
The Semantic Web Eliminates Traditional Silos By Opening Up and Linking the Data
Twine is a hosted online service, but is not actually a "silo" in the traditional sense because all of our data is represented in open-standards-based RDF, and we are already providing access to that data on an experimental basis, and will provide even more via upcoming API's in the future.
This means that the data Twine is creating and gathering, is open, linked data, that can be reused in other applications and services. Ultimately this makes Twine a part of a growing distributed ecosystem. Semantic Web metadata in RDF and OWL is even better than microformats because it carries its own meaning about how to use it. Software that speaks RDF and OWL can instantly reuse it without any additional programming. To learn more about Twine's open RDF availability, see the Twine Tour: Semantic Web section.
I believe that the open-standards of the Semantic Web eliminate silos. Effectively all services that participate in using these standards and make their data open are becoming part of one big distributed worldwide database, rather than old fashioned silos. That's the benefit of open linked data services powered by RDF, OWL, SPARQL, and GRDDL.
How Will End-Users Participate in the Semantic Web?
If Yahoo and possibly Google make search better by indexing all sorts of metadata, there is then an even larger opportunity to help non-technical end-users create and use that metadata. This is where services like Twine fits in. End-users need ways to author, organize, share, reuse, and discover Semantic Web content.
We don't believe ordinary Webmasters or end-users are going to write microformats or RDF by hand. Even hard-core Semantic Web researchers don't do that. Ultimately end-users need user-friendly services that do this for them automatically, or at least make it easier to do. Twine helps these users to participate in the Semantic Web, without requiring them to have a degree in computer science. Twine provides an (increasingly) user-friendly hosted place where users can collect, organize, share and discover other interesting content around their interests, using the Semantic Web transparently "under the hood."
Concluding Thoughts
In short, Twine is where ordinary non-technical individuals and groups can join the Semantic Web, get a presence there, and start using it in useful ways, today. If Yahoo and Google become the search engines of the Semantic Web, that will make Twine even more necessary as the place where end-users can participate in this emerging ecosystem. We believe our community, and the rich the semantic graph we are growing will become increasingly valuable as the major search engines begin to index the Semantic Web.
But this is just the beginning of our story. Twine is designed to become a platform that others can build on and integrate with as well. There is more to our strategy than we have currently opened up about. In time we will be telling the rest of our story. We have some fun surprises in store in the future...
I want to remind everyone, TWINE IS A BETA. It is only a beta. Beta means not finished, under development, work in progress, construction site, imperfect, open to feedback, undergoing testing, getting better everyday, in need of more work, etc. and many other things that are not synonymous with "finished" or "ready for consumer launch." We know this. We never claimed otherwise. We opened Twine early to get feedback and let the community play around and give us feedback to guide our future work.
Some of the recent coverage of our project has seemingly misunderstood the meaning of the term "beta" or forgotten it, or simply expected a beta to be more of a finished application. Perhaps this is because many companies never come out of beta or use beta to mean "1.0, only cooler." In our case, beta really means Beta. We knew there were bugs and unfinished features, but we decided to open up anyway in order to get user feedback to guide our further work.
But even though Twine is a beta, it is already quite useful, and there is a large and thriving community in there sharing knowledge about interests including the Semantic Web, Web 3.0, Web 2.0, venture capital, politics, art, fashion, travel, cultures, religion, books, and many other interests.
In fact, the number of connections I have in Twine is rapidly approaching, and will probably soon surpass, the number of connections I have in Facebook. And in terms of use, we are finding that our users are visiting Twine many times a day and actively adding information, searching, and participating in discussions and debates there.
The hype around the Semantic Web (and even Twine) is in my opinion justified, but it will take time for that opinion to be obvious to everyone. In the meantime, I do think it has gotten a bit out of control. There is too much wild speculation and a general feeling that somehow the Semantic Web (or services like Twine) will solve every problem on the Internet. That won't be the case. However the Semantic Web and services like Twine that are built with it will improve the content of the Web and enable applications to become smarter with less work.
To some degree the hype around the Semantic Web has set unrealistic expectations and it's not surprising that there is now some backlash. Some folks who came into Twine may have had impossible expectations -- perhaps thinking Twine would be some kind of a three-dimensional interface to all information, or a kind of Hal 9000 intelligent assistant. I'm sorry to disappoint them. Twine is much more pragmatic and focused on things like organizing, sharing and discovering information around interests. It is also just a first step in a long development path in which much more will be added in the future. And let's not forget... Twine is in Beta. It's not finished yet.
I think the backlash is good actually -- it will reset expectations to realistic levels. Hopefully then folks can focus on what the Semantic Web (and Twine) do today, rather than what they imagine they might do in 20 years, or what they don't do yet.
In the case of Twine, it is not a panacea, but it is certainly well on its way to becoming a leading semantically-driven online service with some interesting opportunities in the marketplace. There is certainly a lot more in the application than can be discovered in 7 minutes of using it and I can understand how that might be frustrating to reviewers who have little time and high expectations of a finished consumer app. That is something we are working on and when we eventually move out of beta, it is something we will be able to say we have solved it.
Meanwhile, Twine is a beta and while there is already a LOT there, we can, must, and will be doing much, much more to address usability and finish features that are still under development and imperfect.
UPDATE: I posted some further notes on the fact that Twine is in beta, and what "beta" actually means and why we are in beta here.
Marshall Kirkpatrick wrote a critical review of Twine today that identified several known issues the team is working on. These are points well-taken -- we certainly understand that Twine is still a work in progress and there are many areas where we can improve usability. After all, Twine is still in private invite-only beta and is not a finished application yet. There is much that is still under development and we are learning from our users everyday.
However, we have also been getting quite a lot of very positive feedback from our beta testers as well. Twine is already quite useful and works surprisingly well on a wide selection of Web content today, as our growing beta user base can attest to.
So on balance, while Marshall points out several issues we are aware of and are working on, there is much we are proud of in what we have been able to accomplish so far.
But I want to address some of the specific points Marshall made. Marshall pointed out the following issues:
That's true -- it's sometimes hard for Twine to identify the "content part" of a page when the page has complex structure (including tables, Flash, Ajax, frames, multiple DIV areas, etc.). In the meantime, Twine does actually do a good job on things like Typepad Blogs, the Wikipedia, Youtube, Flickr, Amazon books, Wordpress, and most sites that have relatively standard page structure and/or metadata. That said however, we are working on making Twine smarter so that it can do a better job, even when there is uncertainty about the content and structure of a page. As Marshall points out this is a hard problem because there is so much non-standard content on the Web, but it's not an insurmountable one. Twine will steadily improve over time on this front.
Actually, if Twine can see the author's name, it will recognize them as a related person. But the author's name is not always visible on the article. It would be easier to manage this if there was better metadata on pages, but until that happens, the natural language approach is the main option, and it is not always perfect.
Marshall mentions that he had a hard time getting oriented and finding his way through the application because there is so much there. One of the challenges we have is simply educating users about how to Twine and what it is capable of. In addition there are many improvements we know we can make to the user-interface and information design to make it easier to figure out.
Marshall also asked for RSS feeds and visualizations.
RSS output is already supported to a limited extent and we will have more support for it next month. We are also planning to add RSS input as well in coming months.
Regarding visualizations, we've done a lot of work on visualizations in the past. Our feeling is that they usually don't add much value, other than being eye-candy. However, we will be opening up our API's eventually to allow others to make all the visualizations they want. If someone makes a really useful one, perhaps we'll include it back into Twine.
Finally, I would also like to correct one thing that Marshall mentioned: We are not in fact going into general release next month -- we are just starting to let more people in from our waiting list to continue to help beta test Twine. There will still be a members-only policy in effect for several more months. The full public opening (when Twine will be opened to non-member guests, and search engines, etc.) will be in the summer timeframe. Even then, Twine will still be in beta. There is a good year of additional work to do on Twine before it will be fully "baked," to use Marshall's term. Between now and that time we will be working to improve (and finish) the app, in partnership with our beta community.
In closing, as I have said many times, Twine is still an early Beta and we have to keep expectations in line with reality. Twine is already far and beyond what any other semantic app I know of is capable of, but that still isn't good enough. We have to push further and focus more on usability. We are opening it up early in order to get feedback and more help testing and guiding the direction of the app from users.
Hopefully as we work on Twine further, and we move out of Beta, Twine will eventually meet Marshall's high expectations. Meanwhile, his comments are helpful in that they do give us feedback about what aspects of Twine we need to focus on more as we head towards a more consumer-friendly application.
Special offer to readers of my blog...
There are now well over 30,000 users in the queue to get into the Twine beta. We're going to start letting people in from the waiting list in waves and it should take about a month or two to let everyone in.
But what good is a waiting list if there's no way to cut to the front, right? Fortunately, there is a way to skip ahead to the front of the line...
Write a blog post about Twine on your blog and why you want early access, and send me the link to nova (at) radarnetworks (dot) com. along with your first name, last name, and email address. If I like your post, I'll get you an early access VIP pass to front of the line.
See you in Twine!
Carla Thompson, an analyst for Guidewire Group, has written what I think is a very insightful article about her experience participating in the early-access wave of the Twine beta.
We are now starting to let the press in and next week we will begin to let waves of people in from our over 30,000 user wait list. We will be letting people into the beta in waves every week going forward.
As Carla notes, Twine is a work in progress and we are mainly focused on learning from our users now. We have lots more to do, but we're very excited about the direction Twine is headed in, and it's really great to see Twine getting so much active use.
I'm here at the BlogTalk conference in Cork, Ireland with a range of bloggers and technologists discussing the emerging social Web. Including myself, Ian Davis and Paul Miller from Talis, there are also a bunch of other Semantic Web folks including Dan Brickley, and a group from DERI Galway.
Over dinner a few of us were discussing the terms "Semantic Web" versus "Web 3.0" and we all felt a better term was needed. After some thinking, Ian Davis suggested "Web 3G." I like this term better than Web 3.0 because it loses the "version number" aspect that so many objected to. It has a familiar ring to it as well, reminding me of the 3G wireless phone initiative. It also suggests Tim Berners-Lee's "Giant Global Graph" or GGG -- a synonym for the Semantic Web. Ian stayed up late and put together a nice blog post about the term, echoing many of my own sentiments about how this term should apply to a decade (the third decade of the Web), rather than to a particular technology.