Gap-tagger corpus

Gap-tagger corpus contains data for assessing correctness of automatically generated alternatives for filling a gap (missing word). To get clearly interpretable results, we conducted modified version of A/B testing where the user had to choose between the original word and an alternative. The user has an option either to pick one of the two proposed words, or to report both words as appropriate. Since we know the right answer, we can objectively assess the suitability of alternative answers without formally specifying what classifies as a correct answer. Experiments were run using gap-tagger tool https://github.com/estnltk/gap-tagger.

In the corpus file, each line correspond to one question. The file is in csv format with the following columns:
sentence: sentence
gap_start: start position of the gap word in the sentence
gap_end: end position of the gap word in the sentence
gap_word: correct gap word
variant: gap variant word
correct_selected: indicates if correct word is selected
both_selected: indicates if user reported both words as appropriate
annotator: user id
time: time in milliseconds which took user to answer a question

You don’t have the permission to edit this resource.