The overarching goal is, essentially, to turn text into data for analysis, via the application of natural language processing (NLP), different types of algorithms and analytical methods.
[4] Text analytics describes a set of linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources for business intelligence, exploratory data analysis, research, or investigation.
[7] The latter term is now used more frequently in business settings while "text mining" is used in some of the earliest application areas, dating to the 1980s,[8] notably life-sciences research and government intelligence.
Subtasks—components of a larger text-analytics effort—typically include: Text mining technology is now broadly applied to a wide variety of government, research, and business needs.
[29] For study purposes, Weka software is one of the most popular options in the scientific world, acting as an excellent entry point for beginners.
Additionally, on the back end, editors are benefiting by being able to share, associate and package news across properties, significantly increasing opportunities to monetize content.
The issue of text mining is of importance to publishers who hold large databases of information needing indexing for retrieval.
Therefore, initiatives have been taken such as Nature's proposal for an Open Text Mining Interface (OTMI) and the National Institutes of Health's common Journal Publishing Document Type Definition (DTD) that would provide semantic cues to machines to answer specific queries contained within the text without removing publisher barriers to public access.
Academic institutions have also become involved in the text mining initiative: Computational methods have been developed to assist with information retrieval from scientific literature.
The automatic analysis of vast textual corpora has created the possibility for scholars to analyze millions of documents in multiple languages with very limited manual intervention.
Gender bias, readability, content similarity, reader preferences, and even mood have been analyzed based on text mining methods over millions of documents.
In the UK in 2014, on the recommendation of the Hargreaves review, the government amended copyright law[55] to allow text mining as a limitation and exception.
[56] The fact that the focus on the solution to this legal issue was licenses, and not limitations and exceptions to copyright law, led representatives of universities, researchers, libraries, civil society groups and open access publishers to leave the stakeholder dialogue in May 2013.
The Australian Law Reform Commission has noted that it is unlikely that the "research and study" fair dealing exception would extend to cover such a topic either, given it would be beyond the "reasonable portion" requirement.
For example, large datasets based on data extracted from news reports can be built to facilitate social networks analysis or counter-intelligence.
In effect, the text mining software may act in a capacity similar to an intelligence analyst or research librarian, albeit with a more limited scope of analysis.