[1] A measure is said to have a high reliability if it produces similar results under consistent conditions:It is the characteristic of a set of test scores that relates to the amount of random error from the measurement process that might be embedded in the scores.
Scores that are highly reliable are precise, reproducible, and consistent from one testing occasion to another.
For example, while there are many reliable tests of specific abilities, not all of them would be valid for predicting, say, job performance.
Theories of test reliability have been developed to estimate the effects of inconsistency on the accuracy of measurement.
The basic starting point for almost all theories of test reliability is the idea that test scores reflect the influence of two sorts of factors:[7] These factors include:[7] The goal of estimating reliability is to determine how much of the variability in test scores is due to measurement errors and how much is due to variability in true scores (true value).
It is the part of the observed score that would recur across different measurement occasions in the absence of error.
The central assumption of reliability theory is that measurement errors are essentially random.
[7] This equation suggests that test scores vary as the result of two factors: The reliability coefficient
The key to this method is the development of alternate test forms that are equivalent in terms of content, response processes and statistical characteristics.
[7] However, this technique has its disadvantages: This method treats the two halves of a measure as alternate forms.
It provides a simple solution to the problem that the parallel-forms method faces: the difficulty in developing alternate forms.
This halves reliability estimate is then stepped up to the full test length using the Spearman–Brown prediction formula.
This arrangement guarantees that each half will contain an equal number of items from the beginning, middle, and end of the original test.
The most common internal consistency measure is Cronbach's alpha, which is usually interpreted as the mean of all possible split-half coefficients.
[9] Cronbach's alpha is a generalization of an earlier form of estimating internal consistency, Kuder–Richardson Formula 20.
Reliability may be improved by clarity of expression (for written assessments), lengthening the measure,[9] and other informal means.