NER-tagger corpus – META-SHARE

Last view: 2026-05-28

105 Last view: 2026-05-28

NER-tagger corpus

NER-tagger corpus represents a collection of sentences with manually labelled named entities. The labelling is partial -- only a selected word from each sentence is labelled. As a result, the labelled entity may be only a part of a named entity and the sentence may potentially contain other named entities. We distinguish the following types on named entities: PER: person, LOC: location, ORG: organization, FAC: facility, PRD: product, O: other. For each labelled word the label is determined by the largest named entity containing it. For instance, Eesti in the following sentence: "Eesti Ühispanga Tartu kontor oli inimesi täis" is facility although "Eesti" is location and "Eesti Ühispank" is and organisation.

The corpus has been created using nertagger web tool: https://github.com/estnltk/ner-tagger. Two human annotators have been involved in the annotation process.

The data file contains one sentence per line with the following columns:
name named entity token
sentence sentence
start entity start offset in the sentence
end entity end position in the sentence
label assigned label
annotator human annotator id
time number of milliseconds it took annotator to tag a word.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Unrestricted Use

Licence

CC - BY - NC

Licensors:

Contact Person

Alexander Tkachenko

text

Monolingual text corpusLanguages

Estonian

Linguality

Linguality type: Monolingual

Multi-linguality type: Multilingual Single Text

Size

5 Mb

Metadata

Created: 11/29/2016

Last Updated: 11/30/2016

People who looked at this resource also viewed the following: