OLAC Record oai:scholarspace.manoa.hawaii.edu:10125/42015 |
Metadata | ||
Title: | A tool for sharing interlinearized and lexical data in diverse formats | |
Bibliographic Citation: | Kaufman, Daniel, Finkel, Raphael, Kaufman, Daniel, Finkel, Raphael; 2017-03-02; The last decade has seen great advances in the development of electronic tools for automated interlinearization, corpus creation and lexicon building (e.g. Fieldworks Explorer [FLEx]), as well as tools for creating time-aligned annotations (e.g. ELAN). However, methods for sharing these new data formats online lag far behind. While good options exist for lexical data (e.g. Webonary, Lexique Pro), there is no tool for turning a project created in the FLEx software into an online interlinearized corpus. We present here a tool in development which does precisely that. FLEx databases can be searched using regular expressions and individual lines from a text can be linked to audio and video media. The tool can furthermore bring together linguistic data in diverse formats (from ELAN, Praat, Fieldworks, Toolbox, Shoebox) for a single query and allow for queries over multiple language projects. We discuss the benefits of this program in relation to several ongoing fieldwork projects that are being used to evaluate it. These projects present several interesting challenges. In one, we attempt to create a unified database from several centuries of documentation during which the language showed considerable change. Similarly, in the second project we create a unified database for two lexically, syntactically and phonologically distinct dialects of the same language and show how an interlinearized database facilitates searching across dialects. Finally, in the third project, we show how video data can be integrated into an online FLEx database, a feature which is still lacking in the FLEx software itself. By way of conclusion, we show the audience how to upload their own data (either privately or publicly) and experiment with the tool’s features. Ultimately, the open source program will be available for anyone interested in hosting their own installations.; Kaipuleohone University of Hawai'i Digital Language Archive;http://hdl.handle.net/10125/42015. | |
Contributor (speaker): | Kaufman, Daniel | |
Finkel, Raphael | ||
Creator: | Kaufman, Daniel | |
Finkel, Raphael | ||
Date (W3CDTF): | 2017-03-02 | |
Description: | The last decade has seen great advances in the development of electronic tools for automated interlinearization, corpus creation and lexicon building (e.g. Fieldworks Explorer [FLEx]), as well as tools for creating time-aligned annotations (e.g. ELAN). However, methods for sharing these new data formats online lag far behind. While good options exist for lexical data (e.g. Webonary, Lexique Pro), there is no tool for turning a project created in the FLEx software into an online interlinearized corpus. We present here a tool in development which does precisely that. FLEx databases can be searched using regular expressions and individual lines from a text can be linked to audio and video media. The tool can furthermore bring together linguistic data in diverse formats (from ELAN, Praat, Fieldworks, Toolbox, Shoebox) for a single query and allow for queries over multiple language projects. We discuss the benefits of this program in relation to several ongoing fieldwork projects that are being used to evaluate it. These projects present several interesting challenges. In one, we attempt to create a unified database from several centuries of documentation during which the language showed considerable change. Similarly, in the second project we create a unified database for two lexically, syntactically and phonologically distinct dialects of the same language and show how an interlinearized database facilitates searching across dialects. Finally, in the third project, we show how video data can be integrated into an online FLEx database, a feature which is still lacking in the FLEx software itself. By way of conclusion, we show the audience how to upload their own data (either privately or publicly) and experiment with the tool’s features. Ultimately, the open source program will be available for anyone interested in hosting their own installations. | |
Identifier (URI): | http://hdl.handle.net/10125/42015 | |
Table Of Contents: | 42015.pdf | |
42015.mp3 | ||
Type (DCMI): | Text | |
Sound | ||
OLAC Info |
||
Archive: | Language Documentation and Conservation | |
Description: | http://www.language-archives.org/archive/ldc.scholarspace.manoa.hawaii.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:scholarspace.manoa.hawaii.edu:10125/42015 | |
DateStamp: | 2024-08-09 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Kaufman, Daniel; Finkel, Raphael. 2017. Language Documentation and Conservation. | |
Terms: | dcmi_Sound dcmi_Text |