Ontologies are formal, explicit specifications of how to represent the objects, concepts, and other entities in a particular system, as well as the relationships between them.
Natural-language processing (NLP) is an area of artificial intelligence research that attempts to reproduce the human interpretation of language. NLP methodologies and techniques assume that the patterns in grammar and the conceptual relationships between words in language can be articulated scientifically. The ultimate goal of NLP is to determine a system of symbols, relations, and conceptual information that can be used by computer logic to implement artificial language interpretation.
Natural-language processing has its roots in semiotics, the study of signs. Semiotics was developed by Charles Sanders Peirce (a logician and philosopher) and Ferdinand de Saussure (a linguist). Semiotics is broken up into three branches: syntax, semantics, and pragmatics.
A complete natural-language processor extracts meaning from language on at least seven levels. However, we’ll focus on the four main levels.
Morphological: A morpheme is the smallest part of a word that can carry a discrete meaning. Morphological analysis works with words at this level. Typically, a natural-language processor knows how to understand multiple forms of a word: its plural and singular, for example.
Syntactic: At this level, natural-language processors focus on structural information and relationships.
Semantic: Natural-language processors derive an absolute (dictionary definition) meaning from context.
Pragmatic: Natural-language processors derive knowledge from external commonsense information.
One of the major limitations of modern NLP is that most linguists approach NLP at the pragmatic level by gathering huge amounts of information into large knowledge bases that describe the world in its entirety. These academic knowledge repositories are defined in ontologies that take on a life of their own and never end up in practical, widespread use. There are various knowledge bases, some commercial and some academic. The largest and most ambitious is the Cyc Project. The Cyc Knowledge Server is a monstrous inference engine and knowledge base. Even natural-language modules that perform specific, limited, linguistic services aren’t financially feasible for use by the average developer.
In general, NLP faces the following challenges:
- Physical limitations: The greatest challenge to NLP is representing a sentence or group of concepts with absolute precision. The realities of computer software and hardware limitation make this challenge nearly insurmountable. The realistic amount of data necessary to perform NLP at the human level requires a memory space and processing capacity that is beyond even the most powerful computer processors.
- No unifying ontology: NLP suffers from the lack of a unifying ontology that addresses semantic as well as syntactic representation. The various competing ontologies serve only to slow the advancement of knowledge management.
- No unifying semantic repository: NLP lacks an accessible and complete knowledge base that describes the world in the detail necessary for practical use. The most successful commercial knowledge bases are limited to licensed use and have little chance of wide adoption. Even those with the most academic intentions develop at an unacceptable pace.
- Current information retrieval systems: The performance of most of the current information retrieval systems is affected by semantic overload. Web crawlers, limited by their method of indexing, more often than not return incorrect matches as a result of ambiguous interpretation.
Ontologies and solutions
The W3C’s Resource Definition Framework (RDF) was developed to enable the automated processing of Web resources by providing a means of defining metadata about those resources. RDF addresses the physical limitation of memory space by allowing a natural-language processor to access resources in a distributed environment. A networked computer processor can access RDF models on various other processors in a standard way.
RDF provides a unifying ontological syntax for defining knowledge bases. RDF is expressed in XML, a markup language designed to cleanly separate data formatting from data semantics. As a result of the extensible nature of XML (authors have only the restriction of being well-formed and valid), a number of categories of information can be expressed very clearly using XML and RDF.
RDF is by no means the perfect ontological syntax. For instance, there are five semantic principles: existence, coreference, relation, conjunction, and negation. RDF doesn’t inherently support conjunctions and negations. At its core, RDF allows users to define statements in a simple format about network resources.
This is taken from: