The COST232 consortium collected a "Multi-English" speech database over the telephone in Europe. Originally, it had been planned to collect data only at FUB (Fondazione Ugo Bordoni) in Rome, but in the event it was also possible to make a collection at BT labs in the UK. A total of 797 "successful" calls were collected.
Two countries received calls - Italy and the UK, using different types of collecting equipment (FUB in Rome used analog lines and BT in the UK used digital ones). Everybody had to repeat the same vocabulary - the "TI (Texas Instrument) words" - which makes this database unique in many respects.
The vocabulary comprised the name of the speaker's laboratory, the digits ("oh", zero, one , two, three, four, five, six, seven, eight and nine) and the words: "yes, no, erase, rubout, stop, start, help, enter, repeat, go". The data was collected from the following countries: Belgium, Czechoslovakia, Denmark, England, Germany, Italy, Norway, Portugal, Slovenia, Spain, Sweden and Switzerland. Each country provided 8 speakers who made 2 calls from a fixed set and a mobile to both the Italian and UK collection system (i.e. a total of 8 calls per speaker). Although the database was intended to aid for speech recognition, it is also balanced and can therefore be used for speaker recognition training and testing.
View resource description in all available languages