Concept mining

[1][2] Because artifacts are typically a loosely structured sequence of words and other symbols (rather than concepts), the problem is nontrivial, but it can provide powerful insights into the meaning, provenance and similarity of documents.

Recently, techniques that base on semantic similarity between the possible concepts and the context have appeared and gained interest in the scientific community.

These structures can be used to generate simple tree membership statistics, that can be used to locate any document in a Euclidean concept space.

Standard numeric clustering techniques may be used in "concept space" as described above to locate and index documents by the inferred topic.

These are numerically far more efficient than their text mining cousins, and tend to behave more intuitively, in that they map better to the similarity measures a human would generate.