ASRS Data Sets

This page contains links to the ASRS data sets for download.

  1. Raw data: ACN numbers and report narratives, one report in each line.
  2. Decoded data: same as above, but with abbreviations expanded and parial case restoration.
  3. Labeled Data: contains 1333 reports labeled with shaping factors, divided into training set (233 reports), development test set (100 reports) and test set (1000 reports).
  4. Labeled Data With Annotator Rationales: contains 2333 reports labeled with shaping factors, divided into training set (1233 reports), development test set (100 reports) and test set (1000 reports). Also contains all the annotator rationales organized as a lexicon against each of the categories. A shell script is provided that can take the reports files and the lexicon, and produce a more human readable format showing annotations and rationales.