Automated essay scoring

Using the technology of that time, computerized essay scoring would not have been cost-effective,[4] so Page abated his efforts for about two decades.

Eventually, Page sold PEG to Measurement Incorporated By 1990, desktop computers had become so powerful and so widespread that AES was a practical possibility.

As early as 1982, a UNIX program called Writer's Workbench was able to offer punctuation, spelling and grammar advice.

[5] In collaboration with several companies (notably Educational Testing Service), Page updated PEG and ran some successful trials in the early 1990s.

[6] Peter Foltz and Thomas Landauer developed a system using a scoring engine called the Intelligent Essay Assessor (IEA).

ETS's Criterion Online Writing Evaluation Service uses the e-rater engine to provide both scores and targeted feedback.

Under the leadership of Howard Mitzel and Sue Lottridge, Pacific Metrics developed a constructed response automated scoring engine, CRASE.

[12] In 2012, the Hewlett Foundation sponsored a competition on Kaggle called the Automated Student Assessment Prize (ASAP).

[13] 201 challenge participants attempted to predict, using AES, the scores that human raters would give to thousands of essays written to eight different prompts.

[15] Moreover, the claim that the Hewlett Study demonstrated that AES can be as reliable as human raters has since been strongly contested,[16][17] including by Randy E. Bennett, the Norman O. Frederiksen Chair in Assessment Innovation at the Educational Testing Service.

[19] Results of supervised learning demonstrate that the automatic systems perform well when marking by different human teachers is in good agreement.

[19] According to a recent survey,[20] modern AES systems try to score different dimensions of an essay's quality in order to provide feedback to users.

These dimensions include the following items: From the beginning, the basic procedure for AES has been to start with a training set of essays that have been carefully hand-scored.

[23] The various AES programs differ in what specific surface features they measure, how many essays are required in the training set, and most significantly in the mathematical modeling technique.

Before computers entered the picture, high-stakes essays were typically given scores by two trained human raters.

[32] On 12 March 2013, HumanReaders.Org launched an online petition, "Professionals Against Machine Scoring of Student Essays in High-Stakes Assessment".

[35] The petition describes the use of AES for high-stakes testing as "trivial", "reductive", "inaccurate", "undiagnostic", "unfair" and "secretive".

[36] In a detailed summary of research on AES, the petition site notes, "RESEARCH FINDINGS SHOW THAT no one—students, parents, teachers, employers, administrators, legislators—can rely on machine scoring of essays ... AND THAT machine scoring does not measure, and therefore does not promote, authentic acts of writing.