[1] The design of the study (such as a case report for an individual patient or a blinded randomized controlled trial) and the endpoints measured (such as survival or quality of life) affect the strength of the evidence.
At the top of the hierarchy is a method with the most freedom from systemic bias or best internal validity relative to the tested medical intervention's hypothesized efficacy.
[6] The National Cancer Institute defines levels of evidence as "a ranking system used to describe the strength of the results measured in a clinical trial or research study.
[10] The GRADE began in the year 2000 as a collaboration of methodologists, guideline developers, biostatisticians, clinicians, public health scientists and other interested members.
Interventions are assigned to Category 2, supported and probably efficacious treatment, based on positive outcomes of nonrandomized designs with some form of control, which may involve a non-treatment group.
[16] A protocol for evaluation of research quality was suggested by a report from the Centre for Reviews and Dissemination, prepared by Khan et al. and intended as a general method for assessing both medical and psychosocial interventions.
The Khan et al. protocol emphasized the need to make comparisons on the basis of "intention to treat" in order to avoid problems related to greater attrition in one group.
The NREPP evaluation, which assigns quality ratings from 0 to 4 to certain criteria, examines reliability and validity of outcome measures used in the research, evidence for intervention fidelity (predictable use of the treatment in the same way every time), levels of missing data and attrition, potential confounding variables, and the appropriateness of statistical handling, including sample size.
[18] The term was first used in a 1979 report by the "Canadian Task Force on the Periodic Health Examination" (CTF) to "grade the effectiveness of an intervention according to the quality of evidence obtained".
As published in 2009[25][26] they are: In 2011, an international team redesigned the Oxford CEBM Levels to make it more understandable and to take into account recent developments in evidence ranking schemes.
[42] Borgerson in 2009 wrote that the justifications for the hierarchy levels are not absolute and do not epistemically justify them, but that "medical researchers should pay closer attention to social mechanisms for managing pervasive biases".
[43] La Caze noted that basic science resides on the lower tiers of EBM though it "plays a role in specifying experiments, but also analysing and interpreting the data.
[5] In his 2015 PhD Thesis dedicated to the study of the various hierarchies of evidence in medicine, Christopher J Blunt concludes that although modest interpretations such as those offered by La Caze's model, conditional hierarchies like GRADE, and heuristic approaches as defended by Howick et al all survive previous philosophical criticism, he argues that modest interpretations are so weak they are unhelpful for clinical practice.