The Ontology Problem is a fundamental challenge of the emerging Semantic Web. This problem is comprised of three key sub-problems, the Upper Ontology Problem, the Domain Ontology Problem, and the Ontology Integration Problem, described in detail below:
1. The Upper Ontology Problem
When representing the world with ontologies, we need certain basic "building block" concepts before we can ontologically define higher-level concepts. For example, before we can really define what we mean by a "geographic region" we first need basic definitions of building block concepts such as "planet," "geographic location," "set," "boundary," "container" and "content of," and perhaps "elevation," "longitude," "latitude" and so on. Until we have defined these building blocks we cannot build a semantic definition of what a geographic region really is. It turns out that at least when describing our consensus reality, there is a relatively small set of building block concepts that are needed by most ontologies. We can call a set of building-block concepts an "Upper Ontology."
Upper Ontologies are harder to design than domain ontologies in a
certain respect -- they are generally both more granular and more
macroscopic, and generally the concepts they define are more abstract
and often epistemological in nature. While someone may be a domain
expert in their own field and be able to design a fairly decent
ontology about their domain, designing a truly suitable Upper Ontology
is a different specialization altogether. This distinction is similar
to the difference between a programmer who designs compilers and
development environments, and a programmer who writes software on such
compilers and IDE's. These two types of programmers generally have
different categories of skills and knowledge. Similarly, Upper
Ontologies and the skills needed to design them are quite different
from Mid-Level or Lower domain ontologies and the skillsets they
require.
The Upper Ontology Problem is simply that there is no generally-accepted, comprehensive, standardized Upper Ontology in use today. When developing a domain ontology, developers must therefore either:
(a) Develop their own Upper Ontology first (a big task that they shouldn't have to undertake, and probably don't have time to complete),
(b) Use one of the various existing Upper Ontologies such as SUO/SUMO, OpenCyc, or other proposed Upper Ontologies (a choice which is difficult to make for a non-specialist developer because they may not even know how to assess the relative value of these different ontologies, and/or they may not have enough knowledge about the respective languages in which various ontologies are expressed to really understand them without extensive study first, and worse, by choosing one such Upper Ontology all of their own ontology's next-level concepts will automatically become "upper-ontology-dependent" and not necessarily compatible with the other Upper Ontologies they did not choose),
(c) Or, finally, they can decide to just not use an Upper Ontology (the choice made in ontologies such as FOAF; a choice which makes things simpler for the moment, but which also results in an "ontological light-cone" or "ontology horizon" of sorts beyond which the concepts in the ontology become ambiguous and essentially undefined.) None of these choices are easy to make, nor optimal.
Domain-ontology developers should not have to worry about also developing their own Upper Ontologies. Instead, either there should be one truly good standard Upper Ontology, or there should at least exist a meta-ontology that maps all the concepts in the most common Upper Ontologies to one another so that it doesn't matter which one is used. But this hasn't happened yet.
Note: A terrific Upper Ontology that I highly recommend (disclaimer: I helped develop major parts of it), is the University of Texas Clib ontology, which, by the way, is open-source (it says GPL but will actually be LGPL soon). You can view all the current builds here. In particular, I would suggest looking at the OWL version, which is a scaled-down subset of the full ontology (which is in KM, a more expressive axiomatic language). I have contributed a large number of classes and relations to this version of the CLIB so feel free to ask me questions if you would like to discuss this further. Please note that this ontology is still evolving, so if you build on it, you might want to let us know and keep up with changes by checking the builds frequently.
The Domain Ontology Problem
A few useful general-purpose mid-level and lower-level ("domain level") ontologies exist. For example, FOAF is an ontology about people and relationships, DOAP is a proposed ontology about projects, the Dublin Core is an ontology of the most basic properties of library resources, etc. There even highly detailed ontologies developed to describe various medical domains, commerce domains and military domains. However, it is safe to say that the vast majority of vertical subject domains have yet to be modeled ontologically, let alone released in an open manner.
There are simply so many knowledge niches in the world -- even huge ontologies containing tens of thousands of class definitions, such as OpenCyc, are still relatively limited in their conceptual breadth, depth and resolution. In order for all types of information and knowledge to be expressible and accessible in the Semantic Web, ontologies for all these specialized domains need to be developed and made public ally available in some manner. Furthermore, they need to somehow connect together via a solution to the above Upper Ontology Problem so that they can be normalized and mapped to one another easily. Until that happens the Semantic Web will still be incredibly useful, but only for representing and accessing general knowledge or working with domain-specific concepts that are defined by the small set of currently existing domain ontologies.
The solution: More ontologies need to be created about new, vertical domains, and mapped to common open Upper Ontologies. Easier said than done! Before domain-ontologies will be created someone has to come up with a compelling benefit for doing so -- for example, applications or services that make use of these domain-ontologies to solve problems that real people actually have and need solutions to.
The Ontology Integration Problem
As alluded to above, it is one thing to develop an ontology but quite another to make it compatible with other existing ontologies. This is the Ontology Integration Problem. This problem turns out to be far more subtle than most people who currently write about the Semantic Web have noted as of yet. Integrating ontologies is not as simple as just mapping classes in one ontology to corresponding classes in another ontology. Because it turns out that it is not merely the names and properties of classes that are significant to defining their meanings and mappings, but also their inheritance paths in their respective ontologies. For example, consider these two ontology class outlines:
- Ontology A
- Thing
- Legal Person
- Human
- Corporation
- Living Thing
- Person
- Organization
- Corporation
- Professional Occupation
- Lawyer
- Lawyer
- Legal Person
- Thing
- Ontology B
- Thing
- Living Thing
- Person
- Legal Person
- Lawyer
- Legal Person
- Person
- Non-Living Thing
- Organization
- Legal Organization
- Corporation
- Legal Organization
- Organization
- Living Thing
- Thing
If we mapped these ontologies to one another simply by virtue of
mapping "Person" in Ontology A to the class "Person" in Ontology B, we
would wrongly be in plying that Ontology A's concept, "Person" is
equivalent to the Ontology B's concept, "Person." However there is big
difference in actual meaning between what these two ontologies mean by
"Person." This difference comes from semantics implied by ontology
class inheritance differences between the two classes for "Person" in
these two ontologies. Ontology A uses "Person" to mean a human with
legal status in some legal system. Ontology B says that a Person is
simply some type of "Living Thing" but not necessarily a legal entity.
In other words, in Ontology A, "All Persons are Legal Entities" while
in Ontology B, "Some Persons may be Legal Entities," while others may
not be. Similarly, consider how to map between the concepts of "Legal Person" in the two ontologies, an even more hairy problem.
The difficulty in integrating these two ontologies is in figuring out how to express the similarity and difference in meaning between these two concepts of "Person." One answer is to create a new third ontology that attempts to unify the concepts in both Ontology A and Ontology B, which can then be used to map between them -- this is a kind of "semantic middleware" approach; it's weakness is that it only applies to the mapping between these two ontologies and cannot be extended easily to map to additional ontologies, or for different subject domains.
Another approach might be to instead develop a general semantics for expressing inter-ontology mapping concepts -- and then use this meta-ontology to create instances that express mappings between classes and properties in various ontologies. This approach is of particular long-term value, however it is not simple to accomplish -- gradations in semantic intent are notoriously subtle and complex to codify, and to my knowledge nobody has developed an ontology which attempts to formalize them in an open, ontology-independent manner (although prior work in OIL was in that direction).
Having worked on some very large ontology integration problems (integrating two partially overlapping ontologies, each with several thousand classes and properties defined, and expressed in different ontology markup languages with different expressive power, for example), I can tell you that the difficulty of such integration increases exponentially to the number of concepts being integrated.
Because mapping between different ontologies is quite difficult -- even more difficult than designing new ontologies from scratch, most ontology developers take the latter approach. Thus we have few mappings between existing ontologies, and an increasing number of small, non-integrated ontologies about different domains. While it is easy to state that these various ontologies can be integrated such that they eventually all connect together, the task of actual doing such integration is difficult in practice. If the Semantic Web is really going to one day "link together islands of meaning" in different places, we must solve the Ontology Integration Problem.
The alternatives are unacceptable. If we don't solve it, we will either end up with: (a) lots of totally incompatible ontologies and knowledge based on them (just more "data silos" which is precisely what the Semantic Web was supposed to eliminate!), (b) an incomplete set of partially-incorrect mappings between ontologies (because nobody has time to map each ontology to every other ontology, and furthermore, even if they tried, without adequate mapping semantics, such mappings will contain partial-truths or even glaring errors and contradictions).
If the Ontology Integration Problem is not solved it will not be possible to answer a semantic search query across the open Web for a question such as "find all software products that work with Linux and are open-source and are endorsed by people or companies I trust." Why not? Because while there could be tons of raw RDF and OWL instance data out on the Web that is relevant from various ontologies, unless it either all uses the same ontology or all the ontologies that various instances refer to are integrated, the query agent will have no way of making sense of or normalizing the results. Of course, the query agent could simply run the query on all data from all ontologies it knows about, and then just present the results in a single list, sorted by ontology -- but as we've seen above, different ontologies might mean different things by classes with the same names -- and thus the results returned may not really be relevant or well-ordered.
Another solution that has been proposed is to automate this process by perhaps using learning and logic agents to analyze ontological structures and/or the data-sets corresponding to various ontologies, in order to automatically learn or derive rules and mappings that integrate them. I personally doubt that the automated ontology mapping approach will yield useful fruit anytime soon -- there is still no substitute for human domain-expertise in mapping between ontologies. It simply requires too subtle an epistemological and semantic intelligence for an automated program to do well.
I believe the solution will ultimately stem from a solution to the Upper Ontology Problem -- if we can solve that problem, then much of the Ontology Integration Problem will go away as most ontologies will automatically be inter-mapped at the Upper-Ontology Level at least. If we had a standard Upper Ontology and furthermore, if this standard were also to include meta-level concepts for mapping between ontologies and expressing differences in meaning between sets of classes in different ontologies, then integrating ontolgies would certainly be easier.
Note: See Also, this related article, on how to design richer semantics using Roles as a design pattern in ontologies.
Good article! I'd like to mention the Suggested Upper Merged Ontology http://www.ontologyportal.org. Unlike the other available upper ontologies, SUMO has a wealth of domain ontologies that should make it clearer for a given user how he or she might get started using some relevant concepts. Another unique feature is a set of mappings to all of WordNet, which allows the new users to enter most any English word and find out which SUMO terms are relevant.
Posted by: Adam Pease | December 20, 2004 at 01:39 PM
Great breakdown of ontology issues...
But if one of the goals of the Semantic Web is to demolish the concept of "data silos" as well as the rigorous "structured data" concept- doesn't using such a complex ontology put us back to square one anyway? I feel like we're moving away from the idea of having the DATA carry organizational information.
Our brains don't work this way- we don't have an incredibly complex "card catalog" of semantics in our heads.
If you did implement a "perfect" Upper Ontology, as well as a standard functional Ontology - how do all of the "slots" get filled with data about data?
And won't that generate "meta-data" that is as expansive as the current Web itself?
Posted by: Josh Kirschenbaum | November 17, 2004 at 08:06 AM
thank you for the clarification. interesting. i knew there was a reason I subscribed to your feed. =)
Posted by: pat | November 16, 2004 at 11:09 AM
Yes there is a difference between a taxonomy and an ontology. An ontology defines not only the classes (the things) but also their properties, such as their attributes and the types of relationships they can have to one another, as well as possible restrictions on how the classes and properties can be instantiated. Consider this example:
A TAXONOMY:
1. Thing
1.a. Living Thing
1.a.i. Human
1.a.ii. Animal
AN ONTOLOGY:
(Class Thing
hasName (string)
)
(Class LivingThing
is-a Thing
hasName (string)
hasBirthDate (datetime)
hasLocation (GeographicLocation)
hasWeight (long)
)
(Class Human
is-a LivingThing
is-a Thing
hasName (string)
hasBirthDate (datetime)
hasLocation (GeographicLocation)
hasWeight (long)
hasFirstName (string)
hasLastName (string)
hasCityOfResidence (City)
hasNationality (Nation)
hasGender ("male" or "female")
hasHeight (long)
hasEthnicity (Ethinicity)
hasFriend (Role: Friend)
hasEmploymentHistory (EmploymentHistory)
.....
)
Posted by: Nova | November 16, 2004 at 08:56 AM
Brilliant summary of ontology problems. One question. Is there a significant difference between taxonomy and ontology?
(In the past when thinking about the problems you outline I have always used the word taxonomy in my mind, but that's probably because I was unaware that that ontology was the technical word)
Posted by: Pat | November 16, 2004 at 06:33 AM