Content analysis

1800s: Martineau · Tocqueville · Marx · Spencer · Le Bon · Ward · Pareto · Tönnies · Veblen · Simmel · Durkheim · Addams · Mead · Weber · Du Bois · Mannheim · Elias Content analysis is the study of documents and communication artifacts, which might be texts of various formats, pictures, audio or video.

They all involve systematic reading or observation of texts or artifacts which are assigned labels (sometimes called codes) to indicate the presence of interesting, meaningful pieces of content.

Machine learning classifiers can greatly increase the number of texts that can be labeled, but the scientific utility of doing so is a matter of debate.

This database compiles, systematizes, and evaluates relevant content-analytical variables of communication and political science research areas and topics.

Siegfried Kracauer provides a critique of quantitative analysis, asserting that it oversimplifies complex communications in order to be more reliable.

On the other hand, qualitative analysis deals with the intricacies of latent interpretations, whereas quantitative has a focus on manifest meanings.

[4] The codebook includes detailed instructions for human coders plus clear definitions of the respective concepts or variables to be coded plus the assigned values.

By having contents of communication available in form of machine readable texts, the input is analyzed for frequencies and coded into categories for building up inferences.

Computer-assisted analysis can help with large, electronic data sets by cutting out time and eliminating the need for multiple human coders to establish inter-coder reliability.

A study found that human coders were able to evaluate a broader range and make inferences based on latent meanings.

Also, the content validity of the measures can be checked by experts from the field who scrutinize and then approve or correct coding instructions, definitions and examples in the codebook.

The political scientist Harold Lasswell formulated the core questions of content analysis in its early-mid 20th-century mainstream version: "Who says what, to whom, why, to what extent and with what effect?".

[25] Thus, while content analysis attempts to quantifiably describe communications whose features are primarily categorical——limited usually to a nominal or ordinal scale——via selected conceptual units (the unitization) which are assigned values (the categorization) for enumeration while monitoring intercoder reliability, if instead the target quantity manifestly is already directly measurable——typically on an interval or ratio scale——especially a continuous physical quantity, then such targets usually are not listed among those needing the "subjective" selections and formulations of content analysis.

[26][27][28][29][30][31][15][32] For example (from mixed research and clinical application), as medical images communicate diagnostic features to physicians, neuroimaging's stroke (infarct) volume scale called ASPECTS is unitized as 10 qualitatively delineated (unequal) brain regions in the middle cerebral artery territory, which it categorizes as being at least partly versus not at all infarcted in order to enumerate the latter, with published series often assessing intercoder reliability by Cohen's kappa.

The foregoing italicized operations impose the uncredited form of content analysis onto an estimation of infarct extent, which instead is easily enough and more accurately measured as a volume directly on the images.

Through a directed content analysis, the scholars draft a preliminary coding scheme from pre-existing theory or assumptions.

A consistent and clear unit of coding is vital, with the choices ranging from a single word to several paragraphs and from texts to iconic symbols.