Online content analysis

[1] Berelson’s (1952) definition provides an underlying basis for textual analysis as a "research technique for the objective, systematic and quantitative description of the manifest content of communication.

"[2] Content analysis consists of categorizing units of texts (i.e. sentences, quasi-sentences, paragraphs, documents, web pages, etc.)

according to their substantive characteristics in order to construct a dataset that allows the analyst to interpret texts and draw inferences.

While content analysis is often quantitative, researchers conceptualize the technique as inherently mixed methods because textual coding requires a high degree of qualitative interpretation.

This technique has disadvantages because search engine results are unsystematic and non-random making them unreliable for obtaining an unbiased sample.

Early online content analysts often specified a ‘Web site’ as a context unit, without a clear definition of what they meant.

[6] King (2008) used an ontology of terms trained from many thousands of pre-classified documents to analyse the subject matter of a number of search engines.

[3][7] Advances in methodology together with the increasing capacity and decreasing expense of computation has allowed researchers to use techniques that were previously unavailable to analyze large sets of textual content.

This comparison can take the form of inter-coder reliability scores like those used to validate the consistency of human coders in traditional textual analysis.