Misuse of statistics

Statistics, when used in a misleading fashion, can trick the casual observer into believing something other than what the data shows.

One usable definition is: "Misuse of Statistics: Using numbers in such a manner that – either by intent or through ignorance or carelessness – the conclusions are unjustified or incorrect.

Whether the statistics show that a product is "light and economical" or "flimsy and cheap" can be debated whatever the numbers.

Assigning blame for misuses is often difficult because scientists, pollsters, statisticians and reporters are often employees or consultants.

The supplier provides the "statistics" as numbers or graphics (or before/after photographs), allowing the consumer to draw conclusions that may be unjustified or incorrect.

The poor state of public statistical literacy and the non-statistical nature of human intuition make it possible to mislead without explicitly producing faulty conclusion.

The process includes[4][5] experimental planning, the conduct of the experiment, data analysis, drawing the logical conclusions and presentation/reporting.

Many misuses of statistics occur because To promote a neutral (useless) product, a company must find or conduct, for example, 40 studies with a confidence level of 95%.

Organizations that do not publish every study they carry out, such as tobacco companies denying a link between smoking and cancer, anti-smoking advocacy groups and media outlets trying to prove a link between smoking and various ailments, or miracle pill vendors, are likely to use this tactic.

Regarding repeated experiments, he said, "It would be illegitimate and would rob our calculation of its basis if unsuccessful results were not all brought into the account."

than to the question "Considering the rising federal budget deficit and the desperate need for more revenue, do you support cuts in income tax?"

As young people are more likely than other demographic groups to lack a conventional "landline" phone, a telephone poll that exclusively surveys responders of calls landline phones, may cause the poll results to undersample the views of young people, if no other measures are taken to account for this skewing of the sampling.

Scientists have learned at great cost that gathering good experimental data for statistical analysis is difficult.

Pollsters have learned at great cost that gathering good survey data for statistical analysis is difficult.

When results are reported for population subgroups, a larger margin of error will apply, but this may not be made clear.

(In this case, both drowning and ice cream buying are clearly related by a third factor: the number of people at the beach).

It is believed[22] that this is exactly what happened with some of the early studies showing a link between EMF (electromagnetic fields) from power lines and cancer.

However, in many applications, actually doing an experiment in this way is either prohibitively expensive, infeasible, unethical, illegal, or downright impossible.

For example, it is highly unlikely that an IRB would accept an experiment that involved intentionally exposing people to a dangerous substance in order to test its toxicity.

The obvious ethical implications of such types of experiments limit researchers' ability to empirically test causation.

[24] A baldness cure is statistically significant if a sparse peach-fuzz usually covers the previously naked scalp.

The cure is practically significant when a hat is no longer required in cold weather and the barber asks how much to take off the top.

The bald want a cure that is both statistically and practically significant; It will probably work and if it does, it will have a big hairy effect.

A prominent example in the UK is the wrongful conviction of Sally Clark for killing her two sons who appeared to have died of Sudden Infant Death Syndrome (SIDS).

In his expert testimony, now discredited Professor Sir Roy Meadow claimed that due to the rarity of SIDS, the probability of Clark being innocent was 1 in 73 million.

This was later questioned by the Royal Statistical Society;[30] assuming Meadows figure was accurate, one has to weigh up all the possible explanations against each other to make a conclusion on which most likely caused the unexplained death of the two children.

[31] The 1 in 73 million figure was also misleading as it was reached by finding the probability of a baby from an affluent, non-smoking family dying from SIDS and squaring it: this erroneously treats each death as statistically independent, assuming that there is no factor, such as genetics, that would make it more likely for two siblings to die from SIDS.

[32][33] This is also an example of the ecological fallacy as it assumes the probability of SIDS in Clark's family was the same as the average of all affluent, non-smoking families; social class is a highly complex and multifaceted concept, with numerous other variables such as education, line of work, and many more.

Assuming that an individual will have the same attributes as the rest of a given group fails to account for the effects of other variables which in turn can be misleading.

[39] Anscombe's quartet is a made-up dataset that exemplifies the shortcomings of simple descriptive statistics (and the value of data plotting before numerical analysis).