Dataset for Rubric-based Essay Scoring
This is an official website of DREsS: Dataset for Rubric-based Essay Scoring on EFL Writing (Yoo et al., 2024).
DREsS is a large-scale, standard dataset for rubric-based automated essay scoring. DREsS comprises three sub-datasets: DREsS_New, DREsS_Std., and DREsS_CASE. We collect DREsS_New, a real-classroom dataset with 1.7K essays authored by EFL undergraduate students and scored by English education experts. We also standardize existing rubric-based essay scoring datasets as DREsS_Std. We generate 20K synthetic samples of DREsS_CASE using CASE (corruption-based augmentation strategy for essays).
The essays in DREsS are scored on a range of 1 to 5, with increments of 0.5, based on the three rubrics: content
, organization
, and language
.
Criteria | Description |
---|---|
Content | Paragraph is well-developed and relevant to the argument, supported with strong reasons and examples. |
Organization | The argument is very effectively structured and developed, making it easy for the reader to follow the ideas and understand how the writer is building the argument. Paragraphs use coherence devices effectively while focusing on a single main idea. |
Language | The writing displays sophisticated control of a wide range of vocabulary and collocations. The essay follows grammar and usage rules throughout the paper. Spelling and punctuation are correct throughout the paper. |
Column | Type | Description |
---|---|---|
id | Integer | A unique identifier of each essay sample |
source | String | [Optional] An original source of the essay sample (only for DREsS_std) |
prompt | String | An essay prompt |
essay | String | A student-written essay |
score | Float | A rubric-based score of the essay (content, organization, language, total) |
Subdata | Source | Content | Organization | Language |
---|---|---|---|---|
DREsS_New | - | 2,279 | 2,279 | 2,279 |
DREsS_Std. | ASAP P7 | 1,569 | 1,569 | 1,569 |
ASAP P8 | 723 | 723 | 723 | |
ASAP++ P1 | 1,785 | 1,785 | 1,785 | |
ASAP++ P2 | 1,799 | 1,799 | 1,799 | |
ICNALE EE | 639 | 639 | 693 | |
DREsS_CASE | - | 8,307 | 31,086 | 792 |
Total | 17,101 | 39,880 | 9,586 |
Please submit the consent form. After reviewing your consent form, we will send you the dataset link soon through email.
@article{yoo2024dress,
title={DREsS: Dataset for Rubric-based Essay Scoring on EFL Writing},
author={Haneul Yoo and Jieun Han and So-Yeon Ahn and Alice Oh},
journal={arXiv preprint arXiv:2402.16733},
year={2024},
}