ROUGE (metric)

ROUGE, or Recall-Oriented Understudy for Gisting Evaluation,[1] is a set of metrics and a software package used for evaluating automatic summarization and machine translation software in natural language processing.

The metrics compare an automatically produced summary or translation against a reference or a set of references (human-produced) summary or translation.

ROUGE metrics range between 0 and 1, with higher scores indicating higher similarity between the automatically produced summary and the reference.

The following five evaluation metrics are available.