Holistic grading

Although the value and validation of the system are a matter of debate, holistic scoring of writing is still in wide application.

Raters are first calibrated as a group so that two or more of them can independently assign the final score to writing sample within a pre-determined degree of reliability.

[5][6][7] The composing of extended pieces of prose has been required of workers in many salaried walks of life, from science, business, and industry to law, religion, and politics.

[8] Competence in writing extended prose has also formed part of qualifying or certification tests for teachers, public servants, and military officers.

Isolated parts of it can be tested with "objective", short-answer items: correct spelling and punctuation, for instance.

But how well do item questions evaluate potential or accomplishment in writing coherent and meaningful extended passages?

[12][13][14][15] Holistic scoring, with its attention to both reliability and validity, offers itself as a better method of judging writing competence.

In Britain, pooled-rater holistic scoring was first experimentally tested in 1934, employing ten teacher-raters per sample.

For instance, the scoring guide used in a 1969 City University of New York study of student writing had five criteria (ideas, organization, sentence structure, wording, and punctuation/mechanics/spelling) and three levels (superior, average, unacceptable).

[28] The rationale for scoring guides argues that it forces scorers to attend to a spread of writing accomplishments and not give undue influence to one or two (the "halo effect").

[33][34] Single-rater monitored scoring trains raters as a group and may provide them with a detailed marking scheme.

[35][36] In the United States, for the Writing Section of the TOEFLiBT,[37] the Educational Testing Service now uses the combination of automated scoring and a certified human rater.

[40] Although other LEAs in Great Britain tried the system during the 1950s and 1960s and its reliability and validity was much studied by British researchers, it failed to take hold.

The method was adjusted-rater scoring with teachers of the course as scorers and members of the Board of Examiners as adjusters.

[50] In the USA an exponential spread in holistic scoring took place from around 1975 to 1990, fueled in part by the educational accountability movement.

It is also used for placement or academic progression in some institutions of higher education, for instance at Washington State University.

Occasionally, especially with high-impact uses such as in standardized testing for college admission, efforts are made to estimate the concurrent validity of the scores.

[56] More often, predictive validity is measured by comparing a school student's holistic score with later achievement in college courses, usually first-semester GPA, end-of-course grade in a first-year writing course, or teacher opinion of the student's writing ability.