OLAC Record oai:catalogue.elra.info:ELRA-W0089 |
Metadata | ||
Title: | NPChunks | |
Access Rights: | Rights available for: nonCommercialUse, commercialUse | |
Date Available (W3CDTF): | 2016-01-20 | |
Date Issued (W3CDTF): | 2016-01-20 | |
Date Modified (W3CDTF): | 2016-01-20 | |
Description: | NPChunks is a training corpus containing approximately 1,000 sentences, with a total of 24,243 tokens, selected randomly from the written part of the CINTIL corpus. For more information on the CINTIL corpus, see ELRA-W0050, ISLRN: 176-775-844-396-0.The corpus is PoS-annotated at token level, including punctuation. Noun Phrases were recognized and annotated with specific tags. It was automatically PoS-tagged with MBT tagger (http://ilk.uvt.nl/mbt/), and lemmatized with MBLEM (http://ilk.uvt.nl/mbma/), following the annotation scheme of the Corpus of Reference of Contemporary Portuguese. YamCha software (http://chasen.org/~taku/software/yamcha/) was used to recognize chunks that consist of Noun Phrases and to identify the elements appearing at the beginning, in the middle and at the end of a noun phrase. | |
Identifier: | ELRA-W0089 | |
ISLRN: 412-883-442-173-8 | ||
Identifier (URI): | https://catalog.elra.info/en-us/repository/browse/ELRA-W0089/ | |
Language: | Portuguese | |
Language (ISO639): | por | |
Medium: | downloadable | |
Publisher: | ELRA (European Language Resources Association) | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | ELRA Catalogue of Language Resources | |
Description: | http://www.language-archives.org/archive/catalogue.elra.info | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:catalogue.elra.info:ELRA-W0089 | |
DateStamp: | 2016-01-20 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | n.a. 2016. ELRA (European Language Resources Association). | |
Terms: | area_Europe country_PT dcmi_Text iso639_por olac_primary_text |