OLAC Record oai:www.ldc.upenn.edu:LDC2000T47 |
Metadata | ||
Title: | Hong Kong Laws Parallel Text | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Ma, Xiaoyi. Hong Kong Laws Parallel Text LDC2000T47. Web Download. Philadelphia: Linguistic Data Consortium, 2000 | |
Contributor: | Ma, Xiaoyi | |
Date (W3CDTF): | 2000 | |
Date Issued (W3CDTF): | 2000-07-15 | |
Description: | *Introduction* Hong Kong Laws Parallel Text was developed by the Linguistic Data Consortium (LDC) and consists of processed and sentenced-aligned Chinese-English documents from the Department of Justice of the Hong Kong Special Administrative Region (HKSAR) of the Peoples Republic of China. LDC wishes to thank the Hong Kong Special Administrative Region of the Peoples Republic of China for granting the LDC permission to distribute this data to the research community. *DATA* This corpora is organized into 19 parallel file pairs for a total of 38 files. Each parallel file pair is named hklaws.nn.[ec] where: * nn = sequence number and * the file extensions, c = Cantonese and e = English Each files holds up to 2,000 sequentially numbered sentences tagged with a sentence index and sequence number as described below for a total of 37,807 sentence indices across all 19 file pairs. The sentence numbering spans the file pairs such that the initial sentence index (in files hklaws.01.e and hklaws.01.c) is 1, and the last sentence index (in files hklaws.19.e and hklaws.19.c) is 37807. The sentence numbering establishes the sentence parallelism two sentences having the same index and sequence number are purported to be parallel in content. Each sentence index may contain one or more sequentially numbered sentences, with corresponding files in English and Chinese containing the corresponding sets of sentences. The initial sequence number of each sentence is 1. The sentence sequence number plus the sentence index number is sufficient to uniquely identify parallel sentences. There are 313,659 sentences in the corpora. Each sentence is of the form:...... ...... where # represents a one to five digit sentence index or sequence number. Automatic sentence alignment was done at the LDC. The example.c and example.e files contains sample corresponding Chinese and English Law files from the corpus. The Chinese files are encoded in BIG5 with user-defined characters by HKSAR. See http://www.info.gov.hk/gccs for details. *Copying and distribution* Permission has been granted to the Linguistic Data Consortium to make and distribute copies of the laws, press releases and news of Hong Kong Special Administrative Region, provided this copyright notice and permission notice are distributed with all copies. Permission has been given to the Linguistic Data Consortium to reproduce the laws, press releases, and/or news articles from the Hong Kong Special Administrative Region Government website for research, education, and technology development. *Updates* There are no updates at this time. *Additional Licensing Instructions* This 'members-only' corpora is available to current members who can request the data at the listed reduced-license fee. Contact [email protected] for information about becoming a member. | |
Extent: | Corpus size: 2764 KB | |
Identifier: | LDC2000T47 | |
https://catalog.ldc.upenn.edu/LDC2000T47 | ||
ISBN: 1-58563-170-1 | ||
ISLRN: 596-847-245-337-1 | ||
DOI: 10.35111/zfbe-bt19 | ||
Language: | English | |
Chinese | ||
Language (ISO639): | eng | |
zho | ||
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2000T47 | |
Rights Holder: | Portions © 1999 The Government of the Hong Kong Special Administrative Region, © 2000 Trustees of the University of Pennsylvania | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2000T47 | |
DateStamp: | 2020-11-30 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Ma, Xiaoyi. 2000. Linguistic Data Consortium. | |
Terms: | area_Europe country_GB dcmi_Text iso639_eng iso639_zho olac_primary_text |