OLAC Record oai:lindat.mff.cuni.cz:11858/00-097C-0000-0005-CF9C-4 |
Metadata | ||
Title: | Czech Parliament Meetings | |
Bibliographic Citation: | http://hdl.handle.net/11858/00-097C-0000-0005-CF9C-4 | |
Creator: | Pražák, Aleš | |
Šmídl, Luboš | ||
Date (W3CDTF): | 2012-03-28T14:45:25Z | |
Date Available: | 2012-03-28T14:45:25Z | |
Description: | The corpus consists of recordings from the Chamber of Deputies of the Parliament of the Czech Republic. It currently consists of 88 hours of speech data, which corresponds roughly to 0.5 million tokens. The annotation process is semi-automatic, as we are able to perform the speech recognition on the data with high accuracy (over 90%) and consequently align the resulting automatic transcripts with the speech. The annotator’s task is then to check the transcripts, correct errors, add proper punctuation and label speech sections with information about the speaker. The resulting corpus is therefore suitable for both acoustic model training for ASR purposes and training of speaker identification and/or verification systems. The archive contains 18 sound files (WAV PCM, 16-bit, 44.1 kHz, mono) and corresponding transcriptions in XML-based standard Transcriber format (http://trans.sourceforge.net) The date of airing of a particular recording is encoded in the filename in the form SOUND_YYMMDD_*. Note that the recordings are usually aired in the early morning on the day following the actual Parliament session. If the recording is too long to fit in the broadcasting scheme, it is divided into several parts and aired on the consecutive days. | |
Identifier (URI): | ZCU_CZ_Parliament | |
http://hdl.handle.net/11858/00-097C-0000-0005-CF9C-4 | ||
Language: | Czech | |
Language (ISO639): | ces | |
Publisher: | University of West Bohemia, Department of Cybernetics | |
Rights: | Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0) | |
http://creativecommons.org/licenses/by-nc-nd/3.0/ | ||
Subject: | speech corpus | |
acoustic model | ||
speaker identification | ||
speaker verification | ||
Type: | corpus | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University | |
Description: | http://www.language-archives.org/archive/lindat.mff.cuni.cz | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:lindat.mff.cuni.cz:11858/00-097C-0000-0005-CF9C-4 | |
DateStamp: | 2021-06-29 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Pražák, Aleš; Šmídl, Luboš. 2012. University of West Bohemia, Department of Cybernetics. | |
Terms: | area_Europe country_CZ dcmi_Text iso639_ces olac_primary_text |