NPChunks

31 Last view: 2026-05-28

1 Last update: 2016-01-28

http://catalog.elra.info/product_info.php?products_id=1256

ID:

ELRA-W0089

NPChunks is a training corpus containing approximately 1,000 sentences, with a total of 24,243 tokens, selected randomly from the written part of the CINTIL corpus. For more information on the CINTIL corpus, see ELRA-W0050, ISLRN: 176-775-844-396-0.

The corpus is PoS-annotated at token level, including punctuation. Noun Phrases were recognized and annotated with specific tags. It was automatically PoS-tagged with MBT tagger (http://ilk.uvt.nl/mbt/), and lemmatized with MBLEM (http://ilk.uvt.nl/mbma/), following the annotation scheme of the Corpus of Reference of Contemporary Portuguese. YamCha software (http://chasen.org/~taku/software/yamcha/) was used to recognize chunks that consist of Noun Phrases and to identify the elements appearing at the beginning, in the middle and at the end of a noun phrase.

View resource description in all available languages

NPChunks est un corpus d’entraînement comprenant environ 1,000 phrases, avec un total de 24,243 mots, choisis de manière aléatoire dans la partie écrite du corpus CINTIL. Pour plus d’informations sur le corpus CINTIL, voir ELRA-W0050, ISLRN: 176-775-844-396-0.

Le corpus a été annoté en parties du discours (signes de ponctuation inclus). Les syntagmes nominaux ont été identifiés et annotés avec des étiquettes spécifiques. Le corpus a été annoté automatiquement en parties du discours avec l’étiqueteur MBT (http://ilk.uvt.nl/mbt/), et lemmatisé avec MBLEM (http://ilk.uvt.nl/mbma/), selon le schéma d’annotation du Corpus de Référence du Portugais Contemporain. Le software YamCha (http://chasen.org/~taku/software/yamcha/) a été utilisé pour identifier les syntagmes nominaux ainsi que les éléments apparaissant au début, au milieu et à la fin du syntagme.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Start date: 01/20/2016

Licence

ELRA VAR

Restrictions: Commercial Use

For Members of ELRA

Fee: 0.00

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Non Members of ELRA

Fee: 0.00

User Nature: Academic

ELRA VAR

Restrictions: Commercial Use

For Non Members of ELRA

Fee: 0.00

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Non Members of ELRA

Fee: 0.00

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Non Members of ELRA

Fee: 0.00

User Nature: Commercial

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Members of ELRA

Fee: 0.00

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Members of ELRA

Fee: 0.00

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Members of ELRA

Fee: 0.00

User Nature: Commercial

Contact Person

Mapelli Valérie

text

Monolingual text corpusLanguages

Portuguese

Linguality

Linguality type: Monolingual

Size

no size available

Metadata

Created: 05/12/2005

Version

Version: 1.0

Last Updated: 01/20/2016

People who looked at this resource also viewed the following: