OLAC Record oai:www.ldc.upenn.edu:LDC2022S03 |
Metadata | ||
Title: | Spoken Digits in Hindi and Indian English | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Bhattacharya, Basabdatta Sen, et al. Spoken Digits in Hindi and Indian English LDC2022S03. Web Download. Philadelphia: Linguistic Data Consortium, 2022 | |
Contributor: | Bhattacharya, Basabdatta Sen | |
Subramanian, Aiswarya | ||
Chatterjee, Purbayan | ||
Dey, Sounak | ||
Date (W3CDTF): | 2022 | |
Date Issued (W3CDTF): | 2022-02-15 | |
Description: | *Introduction* Spoken Digits in Hindi and Indian English was developed by the Birla Institute of Technology and Science Pilani. It contains approximately two hours of speech comprised of spoken digits from one to ten in Hindi and English with regional accents from across India. *Data* The speech data was collected as follows: in person, on a mobile handset recorder app; via one-to-one online communications over social apps; and from social media sites. Each audio file represents a single spoken digit in either Hindi or Indian English. Background noise was mostly retained. Some data was recorded in a noise-free environment or cleaned after recording to avoid abrupt noises such as car horns. The audio data is organized by number, language and gender. The gender breakdown for speakers is 17% female, 27% male, and 56% unspecified. A Google Colab Notebook file which can be used for basic functionalities such as removing noise or unwanted spaces is also included in this release. All audio data is presented as single channel 16-bit 16kHz flac compressed linear PCM. *Samples* Please view these samples: * Hindi Female (FLAC) * English Unspecified (FLAC) * English Male (FLAC) *Updates* None at this time. | |
Extent: | Corpus size: 90831 KB | |
Identifier: | LDC2022S03 | |
https://catalog.ldc.upenn.edu/LDC2022S03 | ||
ISBN: 1-58563-986-9 | ||
ISLRN: 452-404-795-171-3 | ||
DOI: 10.35111/5way-1446 | ||
Language: | English | |
Hindi | ||
Language (ISO639): | eng | |
hin | ||
License: | Spoken Digits in Hindi and Indian English Agreement: https://catalog.ldc.upenn.edu/license/spoken-digits-in-hindi-and-indian-english-agreement.pdf | |
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2022S03 | |
Rights Holder: | Portions © 2022 Basabdatta Sen Bhattacharya, © 2022 Trustees of the University of Pennsylvania | |
Type (DCMI): | Sound | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2022S03 | |
DateStamp: | 2023-01-01 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Bhattacharya, Basabdatta Sen; Subramanian, Aiswarya; Chatterjee, Purbayan; Dey, Sounak. 2022. Linguistic Data Consortium. | |
Terms: | area_Asia area_Europe country_GB country_IN dcmi_Sound iso639_eng iso639_hin olac_primary_text |