LX-4WAnalogies – META-SHARE

Last view: 2026-06-13

40 Last view: 2026-06-13

LX-4WAnalogies

The test set described in was used as the basis for the assessment of word embeddings. An example entry in this data set would read: ‘Berlin Germany Lisbon Portugal’. With these four words relations – as in this example – one can test semantic analogies by using any of the possible combinations of three of the four word vectors in one entry and testing whether or not the resulting vector is similar to the (fourth) word vector missing from the combination being tested. In the example above, the completed analogy should read: ‘Berlin is to Germany as Lisbon is to Portugal’.
The test set contains five types of semantic analogy: common capitals and countries, all capitals and countries, currency, cities and states, and family relations. Nine types of syntactic analogy are also represented: adjective to adverb, opposite, comparative, superlative, present participle, nationality (adjective), past tense, plural nouns and plural verbs. The test set contains a total of 8869 semantic and 10675 syntactic entries.
For the evaluation of the Portuguese word embeddings, the original English test set was translated into Portuguese by skilled, native Portuguese-speaking
language experts. The resulting translations, LX-4WAnalogies, and corresponding English terms are available at http://github.com/nlx-group.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Contact Person

António Branco

text

Monolingual text corpusLanguages

Portuguese

Linguality

Linguality type: Monolingual

Size

10,675 syntactic Entries

8,869 semantic Entries

Modalities

Written Language

Metadata

Created: 01/30/2017

Last Updated: 01/30/2017

Metadata Language: English (en)

Version

Version: 1.0

Last Updated: 01/30/2017

People who looked at this resource also viewed the following: