Corpus of Spoken Estonian – META-SHARE

Last view: 2026-05-28

110 Last view: 2026-05-28

Corpus of Spoken Estonian

View resource name in all available languages

Suulise keele korpus

ID:

http://hdl.handle.net/11297/1-00-0000-0000-0000-0002-7

doi:10.15155/TY.0009

The Department of Estonian Language initiated the corpus of spoken Estonian in 1997. The corpus is compiled by the research group of Spoken Estonian (Tiit Hennoste, Airi Jansons, Liina Lindström, Andriela Rääbis, Krista Strandson, Piret Toomet, Riina Vellerind).
The corpus is transcribed by the transcription of conversational analysis (CA). Each tape is provided with a header that lists in all 44 situational factors that have been found to affect language use in the analysis of various languages. For each concrete tape the number of possible factors is as high as possible.
The corpus is planned as an open corpus, i.e. no limits have been set. Our intention is to collect various types of oral speech, the usage of both everyday and institutional conversation, spontaneous and planned speech, monologues and dialogues, face-to-face interaction and media texts. The speakers are inhabitants of the largest towns of Estonia: Tallinn, Tartu and Pärnu.
As of April 2008, the corpus consists of 710 audio tapes, 20 video tapes, 1970 transliterated texts (1 315 000 words).
Transliterated texts:
Face-to-face conversations 559 (716100 words): 181 everyday conversations, 342 institutional conversations, 36 other types (e.g. asking for directions);
Phone conversations 1259 (350080 words): 175 everyday conversations, 1076 institutional conversations, 8 other;
Radio and TV broadcasts 149 (249300 words).
The institutional situations include a large number of shop dialogues (65) and dialogues at service institutions and government offices.
The corpus is a data bank in the Word format and simple txt-format (ISO-8859-1). In order to access the corpus, a contract with the research group of Spoken Estonian is required.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Licence

Proprietary

Restrictions: Other

User Nature: Academic

Contact Person

Olga Gerassimenko

text
audio

Monolingual text corpusLanguages

Estonian

Linguality

Linguality type: Monolingual

Size

1 315 500 Words

Modalities

Spoken Language

Monolingual audio corpusLanguages

Estonian

Linguality

Linguality type: Monolingual

Size

1 315 500 Words

Modalities

Spoken Language

Content

Speech items: Free Speech

Metadata

Created: 01/09/2013

Last Updated: 05/22/2015

Revision: 6

Metadata Creator

People who looked at this resource also viewed the following: