Sample Metadata Record
<olac:olac>
<dc:title>MULTEXT-East "1984" document corpus 4.0</dc:title>
<dc:creator>Erjavec, Tomaž</dc:creator>
<dc:creator>Bruda, Ştefan</dc:creator>
<dc:creator>Dimitrova, Ludmila</dc:creator>
<dc:creator>Ide, Nancy</dc:creator>
<dc:creator>Kaalep, Heiki-Jaan</dc:creator>
<dc:creator>Krstev, Cvetana</dc:creator>
<dc:creator>Orav, Heili</dc:creator>
<dc:creator>Oravecz, Csaba</dc:creator>
<dc:creator>Paldre, Leho</dc:creator>
<dc:creator>Petkevič, Vladimír</dc:creator>
<dc:creator>Priest-Dorman, Greg</dc:creator>
<dc:creator>Simov, Kiril</dc:creator>
<dc:creator>Sinapova, Lydia</dc:creator>
<dc:creator>Sokolovsky, Paul</dc:creator>
<dc:creator>Sryvkin, Sergey</dc:creator>
<dc:creator>Tufiş, Dan</dc:creator>
<dc:creator>Utka, Andrius</dc:creator>
<dc:creator>Villandi, Viire</dc:creator>
<dc:creator>Vitas, Duško</dc:creator>
<dc:creator>Vuković, Olga</dc:creator>
<dc:date xsi:type="dcterms:W3CDTF">2015-06-15T08:56:08Z</dc:date>
<dcterms:available>2015-06-15T08:56:08Z</dcterms:available>
<dc:description>The novel "1984" by George Orwell is the central component of the MULTEXT-East corpus. This parallel and sentence aligned corpus contains the novel in the English original (about 100,000 words in length), and its translations into a number of languages. This version of the corpus contains structurally annotated texts only, which contain elements such as the paragraph, the footnote, and highlighted text. In terms of linguistic annotations, the text contain names and sentences. The linguistically annotated texts are a separate submission (http://hdl.handle.net/11356/1043) also with somewhat different languages.</dc:description>
<dc:identifier xsi:type="dcterms:URI">http://hdl.handle.net/11356/1044</dc:identifier>
<dcterms:bibliographicCitation>http://hdl.handle.net/11356/1044</dcterms:bibliographicCitation>
<dc:language xsi:type="olac:language" olac:code="bul"/>
<dc:language xsi:type="olac:language" olac:code="ces"/>
<dc:language xsi:type="olac:language" olac:code="eng"/>
<dc:language xsi:type="olac:language" olac:code="est"/>
<dc:language xsi:type="olac:language" olac:code="hun"/>
<dc:language xsi:type="olac:language" olac:code="lit"/>
<dc:language xsi:type="olac:language" olac:code="ron"/>
<dc:language xsi:type="olac:language" olac:code="rus"/>
<dc:language xsi:type="olac:language" olac:code="slv"/>
<dc:language xsi:type="olac:language" olac:code="srp"/>
<dc:publisher>Jožef Stefan Institute</dc:publisher>
<dc:rights>Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)</dc:rights>
<dc:rights>https://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
<dc:subject>parallel corpus</dc:subject>
<dc:subject>multilingual</dc:subject>
<dc:subject>TEI</dc:subject>
<dc:type>corpus</dc:type>
<dc:type xsi:type="dcterms:DCMIType">Text</dc:type>
<dc:type xsi:type="olac:linguistic-type" olac:code="primary_text"/>
</olac:olac>
Title | MULTEXT-East "1984" document corpus 4.0 |
Creator | Erjavec, Tomaž |
Creator | Bruda, Ştefan |
Creator | Dimitrova, Ludmila |
Creator | Ide, Nancy |
Creator | Kaalep, Heiki-Jaan |
Creator | Krstev, Cvetana |
Creator | Orav, Heili |
Creator | Oravecz, Csaba |
Creator | Paldre, Leho |
Creator | Petkevič, Vladimír |
Creator | Priest-Dorman, Greg |
Creator | Simov, Kiril |
Creator | Sinapova, Lydia |
Creator | Sokolovsky, Paul |
Creator | Sryvkin, Sergey |
Creator | Tufiş, Dan |
Creator | Utka, Andrius |
Creator | Villandi, Viire |
Creator | Vitas, Duško |
Creator | Vuković, Olga |
Date (W3CDTF) | 2015-06-15T08:56:08Z |
Available | 2015-06-15T08:56:08Z |
Description | The novel "1984" by George Orwell is the central component of the MULTEXT-East corpus. This parallel and sentence aligned corpus contains the novel in the English original (about 100,000 words in length), and its translations into a number of languages. This version of the corpus contains structurally annotated texts only, which contain elements such as the paragraph, the footnote, and highlighted text. In terms of linguistic annotations, the text contain names and sentences. The linguistically annotated texts are a separate submission (http://hdl.handle.net/11356/1043) also with somewhat different languages. |
Identifier (URI) | http://hdl.handle.net/11356/1044 |
Bibliographic Citation | http://hdl.handle.net/11356/1044 |
Language (ISO639-3) | Bulgarian [bul] |
Language (ISO639-3) | Czech [ces] |
Language (ISO639-3) | English [eng] |
Language (ISO639-3) | Estonian [est] |
Language (ISO639-3) | Hungarian [hun] |
Language (ISO639-3) | Lithuanian [lit] |
Language (ISO639-3) | Romanian [ron] |
Language (ISO639-3) | Russian [rus] |
Language (ISO639-3) | Slovenian [slv] |
Language (ISO639-3) | Serbian [srp] |
Publisher | Jožef Stefan Institute |
Rights | Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
Rights | https://creativecommons.org/licenses/by-nc-sa/4.0/ |
Subject | parallel corpus |
Subject | multilingual |
Subject | TEI |
Type | corpus |
Type (DCMI) | Text |
Type (OLAC) | Linguistic type: Primary text |
OLAC metadata records are scored for metadata quality on a 10-point scale explained in OLAC Metadata Metrics. The score for the above record (along with comments on changes that could improve the score) is as follows:
Component | + | - | Comments |
---|---|---|---|
Title | 1 | 0 | |
Date | 1 | 0 | |
Agent | 1 | 0 | |
About | 1 | 0 | |
Depth | 1 | 0 | |
Content Language | 1 | 0 | |
Subject Language | 1 | 0 | |
OLAC Type | 1 | 0 | |
DCMI Type | 1 | 0 | |
Precision | 0.67 | 0.33 | For the full score, make use of at least one more encoding scheme in addition to the ones counted explicitly in other components of
the score. For instance,
|
Quality score | 9.67 |