OLAC Record oai:lindat.mff.cuni.cz:11234/1-2517 |
Metadata | ||
Title: | FicTree 1.0 | |
Bibliographic Citation: | http://hdl.handle.net/11234/1-2517 | |
Creator: | Jelínek, Tomáš | |
Hnátková, Milena | ||
Skoumalová, Hana | ||
Date (W3CDTF): | 2017-11-15T19:20:19Z | |
Date Available: | 2017-11-15T19:20:19Z | |
Description: | FicTree is a dependency treebank of Czech fiction manually annotated in the format of the analytical layer of the Prague Dependency Trebank. The treebank consists of 12,760 sentences (166,432 tokens). The texts come from eight literary works published in the Czech Republic between 1991 and 2007. The syntactic annotation of the treebank was first performed by two distinct parsers (MSTParser and MaltParser) trained on the PDT training data, then manually corrected. Any differences between the two versions were resolved manually (by another annotator). The corpus is provided in a vertical format, where sentence boundaries are marked with a blank line. Every word form is written on a separate line, followed by five tab-separated attributes: lemma, tag, ID (word index in the sentence), head and deprel (analytical function, afun in the PDT formalism). The texts are shuffled in random chunks of maximum 100 words (respecting sentence boundaries). Each chunk is provided as a separate file, with the suggested division into train, dev and test sets written as file prefix. | |
Identifier (URI): | http://hdl.handle.net/11234/1-2517 | |
Language: | Czech | |
Language (ISO639): | ces | |
Publisher: | Charles University, Faculty of Arts, Institute of Theoretical and Computational Linguistics | |
Rights: | Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) | |
http://creativecommons.org/licenses/by-nc-sa/4.0/ | ||
Subject: | treebank | |
Type: | corpus | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University | |
Description: | http://www.language-archives.org/archive/lindat.mff.cuni.cz | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:lindat.mff.cuni.cz:11234/1-2517 | |
DateStamp: | 2021-06-29 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Jelínek, Tomáš; Hnátková, Milena; Skoumalová, Hana. 2017. Charles University, Faculty of Arts, Institute of Theoretical and Computational Linguistics. | |
Terms: | area_Europe country_CZ dcmi_Text iso639_ces olac_primary_text |