Greek Dependency Treebank

210 Last view: 2026-06-11

Greek Dependency Treebank

GDT

http://gdt.ilsp.gr/

A resource for Modern Greek annotated at the syntactic and semantic levels. GDT is developed by researchers at the Insitute for Language and Speech Processing, with the help of students from the Texnoglwssia postgraduate program and the University of Athens. GDT includes texts from open-content sources and from corpora collected at ILSP in the framework of research projects aiming at multilingual, multimedia information extraction. The dependency-based annotation scheme used for the syntactic layer of the GDT allows for intuitive representations of structures common in languages with flexible word order. The annotation scheme is based on an adaptation of the guidelines for the Prague Dependency Treebank. Automatic preprocessing of GDT documents included sentence splitting, POS tagging and lemmatization with a suite of natural language processing tools developed at ILSP. The manual annotation of dependency relations is accompanied, for GDT subsets, by annotation of semantic roles (70K tokens) and event annotation based on a shallow domain specific ontology (31K tokens).

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Licence

Under Negotiation

Restrictions: Academic - Non Commercial Use, Attribution

Distribution Access/Medium: Downloadable

Contact Person

Prokopis Prokopidis

text

Monolingual text corpusLanguages

Greek, Modern (1453-)

Linguality

Linguality type: Monolingual

Text Format

text/xml

Size

5,000 Sentences

120,000 Tokens

Character encoding

UTF - 8

Domains

health

politics

travel

history

Modalities

Written Language

AnnotationMorphosyntactic Annotation - B Pos Tagging

Tagset: ILSP/PAROLE

StandOff: False

Format: Prague Markup Language

Annotation Mode: Mixed (Automatic POS tagging followed by manual correction)

Annotation Tools:

ILSP FBT POS tagger

Syntactic Annotation - Treebanks

StandOff: False

Format: Prague Markup Language

Standard practices conformance: Prague Treebank

Annotation Mode: Mixed (Automatic annotation followed by manual correction)

Annotation Tools:

http://ufal.mff.cuni...

Annotators:

Maria Koutsombogera

Prokopis Prokopidis

Elina Desipri

Lemmatization

StandOff: False

Format: Prague Markup Language

Annotation Mode: Mixed (Automatic lemmatization followed by manual correction)

Annotation Tools:

ILSP-Lemmatizer

Segmentation

StandOff: False

Segmentation level: Sentence, Word

Format: Prague Markup Language

Annotation Mode: Mixed (Automatic segmentation followed by manual correction)

Annotation Tools:

ILSP-SST

Semantic Annotation - Semantic Roles

Tagset: Propbank compatible tagset

StandOff: False

Format: Prague Markup Language

Annotation Mode: Manual

Annotation Tools:

http://ufal.mff.cuni...

Size: 70,000 Tokens

Annotators:

Maria Koutsombogera

Elina Desipri

Semantic Annotation - Events

Tagset: Proprietary

StandOff: False

Format: Prague Markup Language

Annotation Mode: Manual

Annotation Tools:

http://ufal.mff.cuni...

Size: 31,000 Tokens

Annotators:

Kanella Pouli

Creation

Creation mode: Mixed

Original Sources

web documents pertaining to politics, health, and travel domains
articles from the Greek Wikipedia
manually normalized transcripts of European parliamentary sessions

Resource Creation

Creation started: 01/01/2005

Metadata

Created: 12/31/2011

Last Updated: 10/10/2012

Source: META-SHARE/ILSP

Metadata Language: English (en)

Metadata Creator

Elina Desipri

Version

Version: 1.3

Revision: addition of new annotated material

Last Updated: 05/04/2012

ValidationValidated

Type of Validation: Content

Validation Mode: Manual

Mode Details: manual validation of dependency relations, POS tags and lemmas

Extent: Full

Validator

Maria Koutsombogera

Prokopis Prokopidis

Usage

Access tools

http://ufal.mff.cuni...

Foreseen UseNlp Applications

Use NLP Specific: Parsing, Semantic Role Labelling

Actual Use - Nlp Applications

Use NLP Specific: Parsing

Documentation

Samples Location: http://gdt.ilsp.gr/g...

Document Type: In Proceedings

Harris Papageorgiou and Elina Desipri and Maria Koutsombogera and Kanella Pouli and Prokopis Prokopidis, Adding multi-layer Semantics to the Greek Dependency Treebank, , Fifth International Conference on Language and Evaluation (LREC-2006) , 2006

Book Title: Proceedings of The Fifth International Conference on Language and Evaluation (LREC-2006)

Document Language: English

Document Type: In Proceedings

Voula Ghotsoulia and Elina Desypri and Maria Koutsombogera and Prokopis Prokopidis and Haris Papageorgiou, Towards a frame semantics resource for Greek, , Sixth Workshop on Treebanks and Linguistic Theories (TLT 2007) , 2007

Book Title: Proceedings of The Sixth Workshop on Treebanks and Linguistic Theories (TLT 2007)

Document Language: English

Document Type: In Proceedings

P. Prokopidis and E. Desipri and M. Koutsombogera and H. Papageorgiou and S. Piperidis, Theoretical and Practical Issues in the Construction of a Greek Dependency Treebank, http://www.ilsp.gr/h... , pp. 149-160 , TLT 2005 , 2005

Editor: Montserrat Civit and Sandra Kübler and Maria Antònia Martí

Book Title: Proceedings of The Fourth Workshop on Treebanks and Linguistic Theories (TLT 2005)

Document Language: English

People who looked at this resource also viewed the following: