OLAC Record
oai:www.ldc.upenn.edu:LDC2015T15

Metadata
Title:TS Wikipedia
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Sezer, Taner, and Türker Sezer. TS Wikipedia LDC2015T15. Web Download. Philadelphia: Linguistic Data Consortium, 2015
Contributor:Sezer, Taner
Sezer, Türker
Date (W3CDTF):2015
Date Issued (W3CDTF):2015-07-15
Description:*Introduction* TS Wikipedia is a collection of approximately 1.6 million processed Turkish Wikipedia pages. The data is tokenized and includes part-of-speech tags, morphological analysis, lemmas, bi-grams and tri-grams. *Data* The data is in a word-per-line format with five tab-separated columns: token, part-of-speech tag, morphological analysis, lemma and corrected token spelling if needed. All data is presented in UTF-8 XML files and was selected and filtered to reduce non-Turkish characters, mathematical formulas and non-Turkish entries. *Samples* Please view this sample. *Updates* None at this time.
Extent:Corpus size: 2279688 KB
Identifier:LDC2015T15
https://catalog.ldc.upenn.edu/LDC2015T15
ISBN: 1-58563-723-8
DOI: 10.35111/mem6-4951
Language:Turkish
Language (ISO639):tur
License:Creative Commons-Attribution-Share-Alike 3.0 (NFP, Non-Member): https://catalog.ldc.upenn.edu/license/creative-commons-attribution-share-alike-3-dot-0-nfp-non-member.pdf
LDC For-Profit Membership Agreement: https://catalog.ldc.upenn.edu/license/ldc-for-profit-membership.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2015T15
Rights Holder:Portions © 2015 Taner Sezer, © 2015 Trustees of the University of Pennsylvania
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2015T15
DateStamp:  2020-11-30
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Sezer, Taner; Sezer, Türker. 2015. Linguistic Data Consortium.
Terms: area_Asia country_TR dcmi_Text iso639_tur olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2015T15
Up-to-date as of: Thu Oct 24 7:30:49 EDT 2024