NUM 5M Mongolian written corpus

118 Last view: 2026-05-04

1 Last update: 2017-08-07

NUM 5M Mongolian written corpus

View resource name in all available languages

Corpus NUM 5M de textes en mongol

http://catalog.elra.info/product_info.php?products_id=1309

ID:

ELRA-W0120

This is a corpus of Mongolian text mostly from domains like online or printed daily newspapers, literature, and laws.

The collected raw texts was reduced from 5 to 4.8 million words after cleaning. The cleaned corpus comprises:
- 144 texts from laws,
- 278 stories,
- 8 novelettes,
- 4 novels from literature;
- 597 news,
- 505 interviews,
- 302 reports,
- 578 essays,
- 469 stories,
- 1,258 editorials from newspaper.

Part of this corpus, about 2,800 sentences with 100,000 words, has been POS-tagged manually and stored in TEI format.

View resource description in all available languages

Il s’agit d’un corpus de textes en mongol provenant principalement de quotidiens en ligne ou papier, de livres et de textes juridiques.

La taille du corpus a été réduite de 5 millions à 4,8 millions de mots après nettoyage des textes bruts. Le corpus nettoyé contient :
- 144 textes juridiques
- 278 histoires
- 8 nouvelles
- 4 romans littéraires
- 597 articles journalistiques
- 505 interviews
- 302 rapports
- 578 essais
- 469 histoires
- 1258 éditoriaux de journaux

Une partie du corpus, environ 2800 phrases (100000 mots), a été annotée manuellement en partie du discours et standardisée au format TEI.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Start date: 07/12/2017

Licence

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Non Members of ELRA

Fee: 7,000.00

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Members of ELRA

Fee: 5,000.00

User Nature: Commercial

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Members of ELRA

Fee: 5,000.00

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Members of ELRA

Fee: 5,000.00

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Members of ELRA

Fee: 0.00

User Nature: Academic

ELRA VAR

Restrictions: Commercial Use

For Non Members of ELRA

Fee: 7,000.00

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Non Members of ELRA

Fee: 7,000.00

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Non Members of ELRA

Fee: 0.00

User Nature: Academic

Contact Person

Mapelli Valérie

text

Monolingual text corpusLanguages

Mongolian

Linguality

Linguality type: Monolingual

Text Format

Plain text

Size

no size available

AnnotationOther

Standard practices conformance: TEI

Metadata

Created: 05/12/2005

Version

Version: 1.0

Last Updated: 07/12/2017

People who looked at this resource also viewed the following: