OLAC Record oai:www.ldc.upenn.edu:LDC2000T46 |
Metadata | ||
Title: | Hong Kong News Parallel Text | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Ma, Xiaoyi. Hong Kong News Parallel Text LDC2000T46. Web Download. Philadelphia: Linguistic Data Consortium, 2000 | |
Contributor: | Ma, Xiaoyi | |
Date (W3CDTF): | 2000 | |
Date Issued (W3CDTF): | 2000-01-15 | |
Description: | *Introduction* Hong Kong News Parallel Text was developed by the Linguistic Data Consortium (LDC) and consists of parallel Chinese - English news articles from the Information Services Department of Hong Kong Special Administrative Region (HKSAR) of the Peoples Republic of China. LDC wishes to thank the Hong Kong Special Administrative Region of the Peoples Republic of China for granting the LDC permission to distribute this data to the research community. *Data* This corpora contains 18,147 aligned article pairs released by HKSAR from July 1, 1997 to April 30, 2000. Automatic article alignment was done at the LDC. The data directory contains 36,294 articles. Each article is a separate file, thus there are 18,147 article pairs. The files are named using the convention yyyymmdd_nnn.[ce] where * yyyy = year * mm = month * dd = date * nnn = article date sequence number * c = Cantonese, and e = English. The example.c and example.e files contains a corresponding sample news article from the corpus. The articles were collected by an automated system from the internet. Incoming data was spooled directly to a raw collection file and the raw files were then processed to produce the following format for release by the LDC. Table.txt maps the Chinese files (*.c) to the corresponding English files (*.e). The Chinese files are encoded in BIG5 with user-defined characters by HKSAR. Click here for details. *Copying and Distribution* Permission has been granted to the Linguistic Data Consortium to make and distribute copies of the laws, press releases and news of Hong Kong Special Administrative Region provided that this copyright notice and the following permission notice are distributed with all copies. Permission has been given to the Linguistic Data Consortium reproduce the laws, press releases, and/or news articles from the Hong Kong Special Administrative Region Government website for research, education and technology development. *Updates* There are no updates at this time. *Additional Licensing Instructions* This 'members-only' corpora is available to current members who can request the data at the listed reduced-license fee. Contact [email protected] for information about becoming a member. | |
Extent: | Corpus size: 68608 KB | |
Identifier: | LDC2000T46 | |
https://catalog.ldc.upenn.edu/LDC2000T46 | ||
ISBN: 1-58563-169-8 | ||
ISLRN: 820-981-482-765-8 | ||
DOI: 10.35111/5n10-kt36 | ||
Language: | English | |
Chinese | ||
Language (ISO639): | eng | |
zho | ||
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2000T46 | |
Rights Holder: | Portions © 1997-2000 The Government of the Hong Kong Special Administrative Region, © 2000 Trustees of the University of Pennsylvania | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2000T46 | |
DateStamp: | 2020-11-30 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Ma, Xiaoyi. 2000. Linguistic Data Consortium. | |
Terms: | area_Europe country_GB dcmi_Text iso639_eng iso639_zho olac_primary_text |