A concept search can overcome these challenges by employing word sense disambiguation (WSD),[2] and other techniques, to help it derive the actual meanings of the words, and their underlying concepts, rather than by simply matching character strings like keyword search technologies.
Over the years, additional auxiliary structures of general interest, such as the large synonym sets of WordNet, have been constructed.
Handcrafted controlled vocabularies contribute to the efficiency and comprehensiveness of information retrieval and related text analysis operations, but they work best when topics are narrowly defined and the terminology is standardized.
Controlled vocabularies require extensive human input and oversight to keep up with the rapid evolution of language.
Controlled vocabularies are also prone to capturing a particular worldview at a specific point in time, which makes them difficult to modify if concepts in a certain topic area change.
This approach is simple, but it captures only a small portion of the semantic information contained in a collection of text.
At the most basic level, numerous experiments have shown that approximately only a quarter of the information contained in text is local in nature.
[8] In addition, to be most effective, this method requires prior knowledge about the content of the text, which can be difficult with large, unstructured document collections.
However, the use of LSI has significantly expanded in recent years as earlier challenges in scalability and performance have been overcome.
[20] Relevance feedback is a feature that helps users determine if the results returned for their queries meet their information needs.
However, the problems of heterogeneous data, scale, and non-traditional discourse types reflected in the text, along with the fact that search engines will increasingly be integrated components of complex information management processes, not just stand-alone systems, will require new kinds of system responses to a query.
[24] In 1997, a Japanese counterpart of TREC was launched, called National Institute of Informatics Test Collection for IR Systems (NTCIR).
NTCIR conducts a series of evaluation workshops for research in information retrieval, question answering, automatic summarization, etc.
A European series of workshops called the Cross-Language Evaluation Forum (CLEF) was started in 2001 to aid research in multilingual information access.
[22] Scientific data about how people use the information tools available to them today is still incomplete because experimental research methodologies haven't been able to keep up with the rapid pace of change.