Knowledge extraction

The RDB2RDF W3C group [1] is currently standardizing a language for extraction of resource description frameworks (RDF) from relational databases.

The following criteria can be used to categorize approaches in this topic (some of them only account for extraction from relational databases):[2] President Obama called Wednesday on Congress to extend a tax break for students included in last year's economic stimulus package, arguing that the policy provides more generous assistance.

When building a RDB representation of a problem domain, the starting point is frequently an entity-relationship diagram (ERD).

Each row in the table describes an entity instance, uniquely identified by a primary key.

Normally, information is lost during the transformation of an entity-relationship diagram (ERD) to relational tables (Details can be found in object-relational impedance mismatch) and has to be reverse engineered.

In a relational table the primary key is an ideal candidate for becoming the subject of the extracted triples.

An XML element, however, can be transformed - depending on the context- as a subject, a predicate or object of a triple.

XSLT can be used a standard transformation language to manually convert XML to RDF.

The largest portion of information contained in business documents (about 80%[10]) is encoded in natural language and therefore unstructured.

As a preprocessing step to knowledge extraction, it can be necessary to perform linguistic annotation by one or multiple NLP tools.

During template element construction the IE system identifies descriptive properties of entities, recognized by NER and CO.

These relations can be of several kinds, such as works-for or located-in, with the restriction, that both domain and range correspond to entities.

As building ontologies manually is extremely labor-intensive and time consuming, there is great motivation to automate the process.

At this process, which is generally semi-automatic, knowledge is extracted in the sense, that a link between lexical terms and for example concepts from ontologies is established.

The following criteria can be used to categorize tools, which extract knowledge from natural language text.

Knowledge discovery developed out of the data mining domain, and is closely related to it both in terms of methodology and terminology.

Usually the knowledge obtained from existing software is presented in the form of models to which specific queries can be made when necessary.