Eleftherotypia Journal Speech database

194 Last view: 2026-06-11

2 Last update: 2013-07-29

Eleftherotypia Journal Speech database

View resource name in all available languages

Base de données orale du journal Eleftherotypia

http://catalog.elra.info/product_info.php?products_id=747

ID:

ELRA-S0111

The Eleftherotypia Speech Database (13 CD-ROMs) consists of read material collected in order to be used for the development of continuous speech recognition systems for the Greek language. All recorded sentences were selected from extracts of the Elefterotypia-journal text corpus and provide a vocabulary of about 40,000 words. The total number of utterances is over 32,000 (aproximately 72 hours of speech material from 120 different speakers, male and female).

Detailed orthographic transcription files are also included in the distribution. There are markings for the utterance's orthography and several speech and non-speech events (e.g. mispronunciations, truncation, noise etc).

The recording procedure took place in three different environments : a sound proof room, a quiet environment and an office environment. Two different microphones were used : a desk microphone and a head-mounted close-talking microphone. The format of the waveform files is NIST. Waveforms are encoded using PCM coding format, 16000 sampling rate, 2 bytes per sample.

View resource description in all available languages

La base de données orale Eleftherotypia (13 CD-ROM) est composée de matériel lu et a été collectée dans le but d'être utilisée pour le développement de systèmes de reconnaissance de la parole continue pour le grec. Toutes les phrases enregistrées proviennent d'extraits du corpus de textes du journal Eleftherotypia et fournit un vocabulaire d'environ 40 000 mots. Le nombre total d'occurrences s'élève à plus de 32 000 (ce qui représente environ 72 heures de parole de 120 locuteurs hommes et femmes différents).

Des fichiers de transcription orthographique détaillés sont également inclus. L'orthographe de chaque occurrence ainsi que plusieurs événement de parole et de non-parole ont été marqués (défauts de prononciation, troncations, bruit, etc.).

Les enregistrements se sont déroulés dans trois environnement différents : salle insonorisée, environnement calme et environnement de bureau. Deux microphones différents ont été utilisés, ainsi qu'un micro-casque. Les signaux sont échantillonnés à 16 KHz avec une résolution de 16 bits et codés au format PCM-NIST.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Start date: 07/10/2001

Licence

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Non Members of ELRA

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Members of ELRA

User Nature: Commercial

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Members of ELRA

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Members of ELRA

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Members of ELRA

User Nature: Academic

ELRA VAR

Restrictions: Commercial Use

For Non Members of ELRA

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Non Members of ELRA

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Non Members of ELRA

User Nature: Academic

Contact Person

Mapelli Valérie

audio

Monolingual audio corpusLanguages

Greek, Modern (1453-)

Linguality

Linguality type: Monolingual

Size

no size available

Metadata

Created: 05/12/2005

Last Updated: 07/29/2013

Version

Version: 1.0

Last Updated: 02/22/2007

People who looked at this resource also viewed the following: