OLAC Record oai:lindat.mff.cuni.cz:11858/00-097C-0000-0022-60D6-1 |
Metadata | ||
Title: | W2C – Web to Corpus – tool | |
Bibliographic Citation: | http://hdl.handle.net/11858/00-097C-0000-0022-60D6-1 | |
Creator: | Majliš, Martin | |
Date (W3CDTF): | 2013-06-25T13:21:15Z | |
Date Available: | 2013-06-25T13:21:15Z | |
Description: | A tool used to build multilingual corpora from wikipedia. Download the web pages, convert them to plain text, identify language, etc. A set of 120 corpora collected using this tool is available at https://ufal-point.mff.cuni.cz/xmlui/handle/11858/00-097C-0000-0022-6133-9 | |
Identifier (URI): | http://hdl.handle.net/11858/00-097C-0000-0022-60D6-1 | |
Language: | No linguistic content | |
Language (ISO639): | zxx | |
Publisher: | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) | |
Rights: | Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) | |
http://creativecommons.org/licenses/by-sa/3.0/ | ||
Subject: | web data | |
wikipedia | ||
corpus creation | ||
Type: | toolService | |
Type (DCMI): | Software | |
OLAC Info |
||
Archive: | LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University | |
Description: | http://www.language-archives.org/archive/lindat.mff.cuni.cz | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:lindat.mff.cuni.cz:11858/00-097C-0000-0022-60D6-1 | |
DateStamp: | 2021-06-29 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Majliš, Martin. 2013. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL). | |
Terms: | dcmi_Software iso639_zxx |