This approach to journalism builds on older practices, most notably on computer-assisted reporting (CAR) a label used mainly in the US for decades.
Other labels for partially similar approaches are "precision journalism", based on a book by Philipp Meyer,[3] published in 1972, where he advocated the use of techniques from social sciences in researching stories.
The in-depth examination of such data sets can lead to more concrete results and observations regarding timely topics of interest.
[6] According to architect and multimedia journalist Mirko Lorenz, data-driven journalism is primarily a workflow that consists of the following elements: digging deep into data by scraping, cleansing and structuring it, filtering by mining for specific information, visualizing and making a story.
Data journalism trainer and writer Paul Bradshaw describes the process of data-driven journalism in a similar manner: data must be found, which may require specialized skills like MySQL or Python, then interrogated, for which understanding of jargon and statistics is necessary, and finally visualized and mashed with the aid of open-source tools.
However, one of the problems for defining data journalism is that many definitions are not clear enough and focus on describing the computational methods of optimization, analysis, and visualization of information.
[11]The term "data journalism" was coined by political commentator Ben Wattenberg through his work starting in the mid-1960s layering narrative with statistics to support the theory that the United States had entered a golden age.
[14] Working for the Detroit Free Press at the time, Philip Meyer used a mainframe to improve reporting on the riots spreading throughout the city.
With a new precedent set for data analysis in journalism, Meyer collaborated with Donald Barlett and James Steele to look at patterns with conviction sentencings in Philadelphia during the 1970s.
Toward the end of the 1980s, significant events began to occur that helped to formally organize the field of computer assisted reporting.
Investigative reporter Bill Dedman of The Atlanta Journal-Constitution won a Pulitzer Prize in 1989 for The Color of Money, his 1988 series of stories using CAR techniques to analyze racial discrimination by banks and other mortgage lenders in middle-income black neighborhoods.
Although data journalism has been used informally by practitioners of computer-assisted reporting for decades, the first recorded use by a major news organization is The Guardian, which launched its Datablog in March 2009.
As projects like the MP expense scandal (2009) and the 2013 release of the "offshore leaks" demonstrate, data-driven journalism can assume an investigative role, dealing with "not-so open" aka secret data on occasion.
Megan Knight suggested a taxonomy that is based on the level of interpretations and analysis that is needed in order to produce a data journalism project.
Their taxonomy had an hierarchical structure and included the following types: data journalism articles with just numbers, with tables, and with visualizations (interactive and non-interactive).
Based on the perspective of looking deeper into facts and drivers of events, there is a suggested change in media strategies: In this view the idea is to move "from attention to trust".
The view to transform media companies into trusted data hubs has been described in an article cross-published in February 2011 on Owni.eu[33] and Nieman Lab.
While the steps leading to results can differ, a basic distinction can be made by looking at six phases: Data can be obtained directly from governmental databases such as data.gov, data.gov.uk and World Bank Data API[35] but also by placing Freedom of Information requests to government agencies; some requests are made and aggregated on websites like the UK's What Do They Know.
While there is a worldwide trend towards opening data, there are national differences as to what extent that information is freely available in usable formats.
Data can also be created by the public through crowd sourcing, as shown in March 2012 at the Datajournalism Conference in Hamburg by Henk van Ess.
More advanced concepts allow to create single dossiers, e.g. to display a number of visualizations, articles and links to the data on one page.
Often such specials have to be coded individually, as many Content Management Systems are designed to display single posts based on the date of publication.
[53] Three global broadsheets, namely The Guardian, The New York Times and Der Spiegel, dedicated extensive sections[54][55][56] to the documents; The Guardian's reporting included an interactive map pointing out the type, location and casualties caused by 16,000 IED attacks,[57] The New York Times published a selection of reports that permits rolling over underlined text to reveal explanations of military terms,[58] while Der Spiegel provided hybrid visualizations (containing both graphs and maps) on topics like the number deaths related to insurgent bomb attacks.
[59] For the Iraq War logs release, The Guardian used Google Fusion Tables to create an interactive map of every incident where someone died,[60] a technique it used again in the England riots of 2011.