This is written in response to a post by Anne Zelenka.
I've been talking about the coming "semantic graph" for quite some time now, and it seems the meme has suddenly caught on thanks to a recent article by Tim Berners-Lee in which he speaks of an emerging "Giant Global Graph" or "GGG." But if the GGG emerges it may or may not be semantic. For example social networks are NOT semantic today, even though they contain various kinds of links between people and other things.
So what makes a graph "semantic?" How is the semantic graph different from social networks like Facebook for example?
Many people think that the difference between a social graph and a semantic graph is that a semantic graph contains more types of nodes and links. That's potentially true, but not always the case. In fact, you can make a semantic social graph or a non-semantic social graph. The concept of whether a graph is semantic is orthogonal to whether it is social.
A graph is "semantic" if the meaning of the graph is defined and exposed in an open and machine-understandable fashion. In other words, a graph is semantic if the semantics of the graph are part of the graph or at least connected from the graph. This can be accomplished by representing a social graph using RDF and OWL, the languages of the Semantic Web.
Today most social networks are non-semantic, but it is relatively easy to transform them into semantic graphs. A simple way to make any non-semantic social graph into a semantic social graph is to use the FOAF ontology to define the entities and links in the graph.
FOAF stands for "friend of a friend" and is a simple ontology of people and social relationships. If a social network links its data to the FOAF ontology, and exposes these linkages to other applications on the Web, then other applications can understand the meaning of the data in the network in an unambiguous manner. In other words it is now a semantic social graph because its semantics are visible to other applications.
As illustrated by the FOAF example above, one way to make a graph semantic is to use the W3C open standards for the Semantic Web (RDF and OWL) to represent, and define the meaning of, the nodes and links in the graph. By using the Semantic Web, the graph becomes machine-understandable and thus more easily navigated, imported by, searched, and integrated by other applications.
For example, let's say that social network Application A comes along and wants to use the dataset of social network Application B. App A sees the graph of nodes and links in B, and it sees something called a "has team" link connecting various nodes in the graph together. What does that mean? What kinds of things can or cannot be connected with this link? What can be inferred if things are connected this way?
The meaning of "has team" is ambiguous to App A because it's not defined anywhere that the software can see. The only way App A can use App B's data correctly is if the programmer of App A speaks to the programmer of App B (or reads something they wrote such as documentation of some sort) that defines what they meant by the "has team" link.
Only by knowing what was intended by the programmer of App B, can App A treat App B's data appropriately, without any misinterpretation that might lead to mistakes or inconsistencies. This is important because, for example, if a user searches for "Yankees Players" should people who are linked by the "has team" link to sports teams called "Yankees" be returned, or does "has team" mean "a connection from a person to a sports team they support," or does it mean "a connection from a person to a sports team they play on," or does it mean "a connection from a person to a workgroup they participate in?" In short, App A has no idea what to do with data that is linked by App B's "has team" link unless it is explicitly programmed to make use of it.
The OWL language (Web Ontology Language) provides a way for the programmers of App A and App B to define what the links in their graphs mean in an unambiguous and machine-understandable way. So App A just has to look up this definition and it can instantly start to use App B's data correctly, without any new programming or difficult integration.
How is this accomplished? The programmer of App B simply uses OWL to define an ontology of social relationships for their service: for example they define the "has team" link to be a link that connects a person to a sports team they play on. They also define what they mean by a "sports team" (for example, "a group of two or more people that play a sport" and a sport is one of "baseball, basketball, football, soccer, hockey, tennis" and they link these terms to another ontology of sports somewhere else on the Web.) The ontology file that defines App B's data is added to the Website of App B, and linked from it's data, so that other applications can see it.
Now when another application such as App A comes along and looks at App B's data it can reference App B's ontology to see for itself what was intended by the "has team" link -- it can see exactly what that link implies and what can be inferred by it. It understands how to use App B's data set, and how to correctly make new links using that data set which are consistent with the meaning of the links it contains.
This is the real point of the Semantic Web open standards -- RDF enables data to be represented in a database independent manner, and OWL enables the semantic of that data to be defined in an open machine-understandable way so that other applications can use that data without having to first be programmed to do so. As long as they speak RDF/OWL, applications can use any data they find and lookup the meaning of any data they need to use so they can use the data appropriately.
For example, suppose another application, App C, that is OWL-aware application but has never seen App B's data-set before and was not programmed specifically to use it, pulls some data out from App B's API. App C can immediately begin to use this data correctly and consistently with how App B uses it, because all that is necessary for understanding how to use B's data is encoded in the OWL ontology that App B's data refers to.
The point is here that using Semantic Web open standards such as RDF and OWL to encode what data means is a giant leap beyond just putting raw data onto the Web in an open format. It doesn't just put the data itself on the Web, it also puts the definition of what the data means and how to use it, on the Web in an open format. A semantic graph is far more reusable than a non-semantic graph -- it's a graph that carries its own meaning.
The semantic graph is not merely a graph with links to more kinds of things than the social graph. It's a graph of interconnected things that is machine-understandable -- it's meaning or "semantics" is explicitly represented on the Web, just like its data. This is the real way to make social networks open. Merely opening up their API's is just the first step.
Only when the semantics of data is defined and shared in an open way can any graph truly be said to be semantic. Once data around the Web is defined in a machine-understandable way, a whole new world of easy, instant mashups becomes possible. Applications can start to freely and instantly mix and match each other's data, including new data they were not programmed in advance to understand. This opens up the door to the Web truly becoming a giant database and eventually an integrated operating system in which all applications are able to more easily interoperate and share data.
The Giant Global Graph may or may not be a semantic graph. That depends on whether it is implemented with, or at least connected to, W3C standards for the Semantic Web.
I believe that because the Semantic Web makes data-integration easier, it will ultimately be widely adopted. Simply put, applications that wish to access or integrate data in the Age of the Web can more easily do so using RDF and OWL. That alone is reason enough to use these standards.
Of course there are many other benefits as well, such as the ability to do more sophisticated reasoning across the data, but that is less important. Simply making data more accessible, connectable, and reusable across applications would be a huge benefit.
Nova, you say that Semantic graphs expose their data and the meaning of links through RDF and OWL and then you say other apps can make use of this data. Can you please give a few examples as to how this semantic graph can be of use for other apps?
Posted by: Naveed | March 10, 2008 at 12:08 AM
Thank you, Nova. This is a great description of how these technologies relate to the web we already know. This is one of the biggest hurdles for us to overcome: explaining what we're so excited about, both to fellow programmers as well as to non-programmers.
Posted by: zeb hodge | December 04, 2007 at 02:36 PM
Hi Nova, very good explanation of the semantic graph. Its always encouraging for me to read ur blog as I too am working on an ambitious semantic web project :)
Posted by: Naveed | December 02, 2007 at 07:37 AM
G'Day from the Antipodes, Nova.
As usual, your thoughts provoke more thoughts. Is that negentropic?
The part of this post I'm not comfortable with is the presumption that--just because RDF provides us with a well-formed container into which we humans can pour information about the information--suddenly the machines will be able to use each other's information ["App C can immediately begin to use this data correctly and consistently with how App B uses it"] to deliver an improvement in the knowledge state experienced by some other humans...
Surely the mere fact that RDF offers a structure does virtually nothing to improve the chance that most humans will start to think like S.R. Ranganathan and populate their ontologies etc with logical, useful, language.
IOW: the G3 gambit makes it much easier and faster for machines to locate yet another confusing piece of human-generated information that requires some level of disambiguation.
I think my nagging problem is that RDF appears to be used to store information fundamentally mismatched with RDFs level of abstraction.
The semantic web cannot be driven by more human coloratura...there's no end to the amount of clarification that is required for usefully machine-mediated human communication, no matter how neatly the storage containers fit together.
To me it seems more productive to use RDF as a storage grid for a type of metadata that is more native (somehow) to the logics of the machines themselves. There needs to be more of a state change in the qualities of the information held at the first (or surely the second) order of abstraction above the original.
I understand the need to improve people's habits of describing what they're talking about so that people two nodes away can still get the message. But the whole project suffers mightily when you ask those same messy humans to do the job.
Cheers!
JB
Posted by: John Brisbin | November 27, 2007 at 04:43 AM
Nova,
I agree with your analysis. To me, however, I have another interpretation about Tim's GGG claim.
I understand GGG to be equivalent to WWW but from a different angle of view. When we talk about WWW, we take the publisher's point of view; when we talk about GGG, we are trying to take the viewer's point of view. Both views look upon the same Web, but gathering a different structure of the Web. Moreover, I believe that the purpose of Twine is exactly an attempt to convert the web information storage from the original publisher-oriented point of view to the more friendly viewer-oriented point of view.
You may look at the entire analysis of my understanding of GGG at Thinking Space.
-- Yihong
Posted by: Yihong Ding | November 24, 2007 at 01:50 PM
Hi Nova, I left a response on GigaOM but also wanted to stop by here and say thanks for the comment and the post.
I'm not arguing that you can't represent a unified social graph semantically -- I'm pointing about how a unified social graph doesn't really adequately represent the complexity of human relationships. And I'm also wondering if moving towards a semweb approach for the social graph removes too much of the human, since semantic web technologies are all about machine processing.
Seems to me that many people calling for a unified social graph are those that treat their friends/fans like undifferentiated nodes. Most people, however, don't have so many people they interact with online (or so many services) that they need a unified, machine-processable approach. And a unified machine-processable approach has drawbacks (possibility for spam and privacy abuses, a loss of having multiple ways to say "you are my friend," a loss of multiple identities online).
Hope you had a nice Thanksgiving. :)
Posted by: Anne Z. | November 24, 2007 at 07:13 AM