OLAC Record: Buckwalter Arabic Morphological Analyzer Version 1.0

OLAC Record
oai:www.ldc.upenn.edu:LDC2002L49

Metadata

Title: Buckwalter Arabic Morphological Analyzer Version 1.0

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Buckwalter, Tim. Buckwalter Arabic Morphological Analyzer Version 1.0 LDC2002L49. Web Download. Philadelphia: Linguistic Data Consortium, 2002

Contributor: Buckwalter, Tim

Date (W3CDTF): 2002

Date Issued (W3CDTF): 2002-11-08

Description: *Introduction* Buckwalter Arabic Morphological Analyzer Version 1.0 is used for annotating Arabic text with part of speech tags. *Data* The data consists primarily of three Arabic-English lexicon files: prefixes (299 entries), suffixes (618 entries), and stems (82,158 entries representing 38,600 lemmas). The lexicons are supplemented by three morphological compatibility tables used for controlling prefix-stem combinations (1,648 entries), stem-suffix combinations (1,285 entries), and prefix-suffix combinations (598 entries). The actual code for morphology analysis and POS tagging is contained in a Perl script. The documentation consists of a readme file with a description of the lexicon files, the morphological compatibility tables, the morphology analysis algorithm, a summary of stem morphological categories, and a table with the author's Arabic transliteration system. *Updates* There has been a case mismatch in the manner by which six files were named in the data, compared with their names in the documentation and the script, which caused the analyzer to crash on case sensitive systems. This problem has been remedied and you can now download the fixed version of the analyzer. *Licensing* Buckwalter Arabic Morphological Analyzer Version 1.0 is released under the GNU General Public License version 2. Organizations interested in licensing the lexicon and/or morphological analyzer for commercial use should contact: QAMUS LLC 448 South 48th St. Philadelphia, PA 19143 ATTN: Tim Buckwalter email: [email protected] *Note* This corpus is free of charge as a web download distribution; a request must be submitted to [email protected] to obtain the data. Note that there is a $100 charge if requested on a CD-ROM.

Identifier: LDC2002L49

https://catalog.ldc.upenn.edu/LDC2002L49

ISBN: 1-58563-257-0

ISLRN: 435-186-167-011-2

DOI: 10.35111/7vzm-mb15

Language: Standard Arabic

English

Language (ISO639): arb

eng

License: GNU General Public License v2: https://catalog.ldc.upenn.edu/license/gnu-general-public-license-v2.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2002L49

Rights Holder: Portions © 2002 QAMUS LLC (www.qamus.org), © 2002 Trustees of the University of Pennsylvania

Subject: Standard Arabic language

Subject (ISO639): arb

Type (DCMI): Text

Type (OLAC): lexicon

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2002L49

DateStamp: 2020-11-30

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Buckwalter, Tim. 2002. Linguistic Data Consortium.
Terms: area_Asia area_Europe country_GB country_SA dcmi_Text iso639_arb iso639_eng olac_lexicon

Inferred Metadata
Country: Saudi Arabia
Area: Asia

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2002L49
Up-to-date as of: Thu Oct 24 7:29:32 EDT 2024

Metadata
Title:		Buckwalter Arabic Morphological Analyzer Version 1.0
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Buckwalter, Tim. Buckwalter Arabic Morphological Analyzer Version 1.0 LDC2002L49. Web Download. Philadelphia: Linguistic Data Consortium, 2002
Contributor:		Buckwalter, Tim
Date (W3CDTF):		2002
Date Issued (W3CDTF):		2002-11-08
Description:		Introduction Buckwalter Arabic Morphological Analyzer Version 1.0 is used for annotating Arabic text with part of speech tags. Data The data consists primarily of three Arabic-English lexicon files: prefixes (299 entries), suffixes (618 entries), and stems (82,158 entries representing 38,600 lemmas). The lexicons are supplemented by three morphological compatibility tables used for controlling prefix-stem combinations (1,648 entries), stem-suffix combinations (1,285 entries), and prefix-suffix combinations (598 entries). The actual code for morphology analysis and POS tagging is contained in a Perl script. The documentation consists of a readme file with a description of the lexicon files, the morphological compatibility tables, the morphology analysis algorithm, a summary of stem morphological categories, and a table with the author's Arabic transliteration system. Updates There has been a case mismatch in the manner by which six files were named in the data, compared with their names in the documentation and the script, which caused the analyzer to crash on case sensitive systems. This problem has been remedied and you can now download the fixed version of the analyzer. Licensing Buckwalter Arabic Morphological Analyzer Version 1.0 is released under the GNU General Public License version 2. Organizations interested in licensing the lexicon and/or morphological analyzer for commercial use should contact: QAMUS LLC 448 South 48th St. Philadelphia, PA 19143 ATTN: Tim Buckwalter email: [email protected] Note This corpus is free of charge as a web download distribution; a request must be submitted to [email protected] to obtain the data. Note that there is a $100 charge if requested on a CD-ROM.
Identifier:		LDC2002L49
		https://catalog.ldc.upenn.edu/LDC2002L49
		ISBN: 1-58563-257-0
		ISLRN: 435-186-167-011-2
		DOI: 10.35111/7vzm-mb15
Language:		Standard Arabic
Language:		English
Language (ISO639):		arb
Language (ISO639):		eng
License:		GNU General Public License v2: https://catalog.ldc.upenn.edu/license/gnu-general-public-license-v2.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2002L49
Rights Holder:		Portions © 2002 QAMUS LLC (www.qamus.org), © 2002 Trustees of the University of Pennsylvania
Subject:		Standard Arabic language
Subject (ISO639):		arb
Type (DCMI):		Text
Type (OLAC):		lexicon
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2002L49
DateStamp:		2020-11-30
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Buckwalter, Tim. 2002. Linguistic Data Consortium.
Terms:		area_Asia area_Europe country_GB country_SA dcmi_Text iso639_arb iso639_eng olac_lexicon
Inferred Metadata
Country:		Saudi Arabia
Area:		Asia