Best Practice Recommendations for Language Resource Description

Date issued:2008-07-11
Status of document:Recommendation. This document embodies an OLAC consensus concerning best current practice.
This version:http://www.language-archives.org/REC/bpr-20080711.html
Latest version:http://www.language-archives.org/REC/bpr.html
Previous version:http://www.language-archives.org/REC/bpr-20080229.html
Abstract:

This document expresses the consensus of the Open Language Archives Community on best practice recommendations for language resource description using the OLAC metadata standard [OLAC-Metadata].

Editors: Gary Simons, SIL International and Graduate Institute of Applied Linguistics (mailto:[email protected])
Steven Bird, University of Melbourne and University of Pennsylvania (mailto:[email protected])
Joan Spanne, SIL International (mailto:[email protected])
Changes since previous version:

Fixes problem in numbering and clarifies that using an encoding scheme means using a value from the scheme.

Copyright © 2008 Gary Simons (SIL International and Graduate Institute of Applied Linguistics), Steven Bird (University of Melbourne and University of Pennsylvania), and Joan Spanne (SIL International). This material may be distributed and repurposed subject to the terms and conditions set forth in the Creative Commons Attribution-ShareAlike 2.5 License.

Table of contents

  1. Introduction
  2. Recommendations
References

1. Introduction

This document expresses the consensus of the Open Language Archives Community on best practices for describing language resources using the OLAC metadata standard [OLAC-Metadata]. The standard itself specifies the formal constraints on the syntax of valid descriptions, but does not set requirements on the content of the metadata elements (beyond general conformance to the definitions of the qualified Dublin Core metadata set [DCMT]). This document provides more detailed guidance about the content of specific metadata elements.

This document does not provide a complete set of advice about what the metadata elements mean or about how they should be used. Such advice is supplied in an OLAC informational note [OLAC-Usage]. This document lists the specific metadata usage guidelines that have been agreed upon by the OLAC community as representing best practice. It is not a requirement of participation that OLAC data providers follow these recommendations. However, these recommendations are used as the standard for evaluating the quality of the metadata records supplied by an OLAC data provider.

2. Recommendations

The following are the OLAC recommendations on best practices for describing language resources. Follow the "more" link to see the fuller discussion of an element in [OLAC-Usage] including definition, usage notes, and examples.

All elements (more)

  1. Recommended best practice is for the value of each metadata element to conform to the definition of that element as given in [DCMT].

  1. When the meaning of a particular element in a metadata record fits the definition of a refinement, recommended best practice is to use the refinement rather than the generic element.

  1. When applicable, recommended best practice is to use the xsi:type attribute to specify an encoding scheme so as to express the value of the element with precision.

  1. Whenever the language of the element content is other than English, recommended best practice is to use the xml:lang attribute with a value from [OLAC-Language] to identify the language.

  1. When a resource has more than one value for a particular metadata element or refinement, recommended best practice is to use a separate instance of the element or refinement for each value rather than listing all the values in a single instance.

Contributor (more)

  1. Recommended best practice is to identify a Contributor by means of a name in a form that is ready for sorting within an alphabetical index.

  1. Recommended best practice is to use a value from the olac:role scheme to indicate the role of the Contributor.

Coverage (more)

  1. Recommended best practice is that a metadata record should contain at least one Coverage (or one of its refinements) or Description (or one of its refinements) or Subject element in order to give the prospective user some idea of the content of the resource that goes beyond the informative potential of just a title alone. Using all of these elements is encouraged.

  1. In the case of spatial coverage, recommended best practice is to use a value from an encoding scheme to give precise geocoding of the resource.

Creator (more)

  1. Recommended best practice is to use the Contributor element instead of Creator, except in cases where there is significant creative involvement by the person or organization and there is no suitable refinement term from the olac:role scheme to use with Contributor.

Date (more)

  1. Recommended best practice is that a record have at least one instance of Date (or one of its refinements).

  1. Recommended best practice is that every instance of Date (or one of its refinements) use a value matching the dcterms:W3CDTF scheme, or enclose the element value in square brackets if the value does not conform to the encoding scheme (e.g., if it is supplied by the cataloger, is approximate, or is in some doubt).

Description (more)

  1. Recommended best practice is that a metadata record should contain at least one Description (or one of its refinements) or Coverage (or one of its refinements) or Subject element in order to give the prospective user some idea of the content of the resource that goes beyond the informative potential of just a title alone. Using all of these elements is encouraged.

Format (more)

  1. In the case of a digital resource, recommended best practice is to express the Format using a MIME type value from the dcterms:IMT scheme.

Identifier (more)

  1. When the value of Identifier is a Uniform Resource Locator (URL), recommended best practice is to specify the dcterms:URI scheme.

Language (more)

  1. Recommended best practice is that every record contain at least one Language element. For a resource that does not include language content, include a Language element containing the code zxx for “No linguistic content.”

  1. Recommended best practice is to use a value from the olac:language scheme with the Language element to identify an individual language precisely.

Publisher (more)

  1. Recommended best practice is to identify a Publisher by means of a name in a form that is ready for sorting within an index.

Relation (more)

  1. When the related resource is also held in a participating archive, recommended best practice is to identify the related resource by means of its OAI identifier. A Relation that begins with “oai:” will typically be presented by service providers as an active link that retrieves the metadata for that resource.

  1. If the related resource is not cataloged in the system of a participating archive, recommended best practice is to identify the related resource through a standard unique identifier. If the related resource is available online, specify the dcterms:URI scheme and give a stable Uniform Resource Locator (URL).

Source (more)

  1. Recommended best practice is as for the Relation element above.

Subject (more)

  1. Recommended best practice is that a metadata record should contain at least one Subject or Coverage (or one of its refinements) or Description element (or one of its refinements) in order to give the prospective user some idea of the content of the resource that goes beyond the informative potential of just a title alone. Using all of these elements is encouraged.

  1. When the subject is a human language, recommended best practice is to use a value from the olac:language scheme with the Subject element to identify an individual language precisely.

  1. When the subject matter falls within the field of linguistics, recommended best practice is to use a value from the olac:linguistic-field scheme to identify the subfield.

Title (more)

  1. Recommended best practice is that every record must have an instance of the Title element. When the resource does not have a formal title, the cataloger should supply a descriptive title and enclose it in square brackets.

  1. Recommended best practice is that there be only one instance of the unqualified Title element, namely, for the original title (except in the case of parallel titles on a diglot work). All other titles (e.g., translations) should be specified as the dcterms:alternative refinement.

Type (more)

  1. Recommended best practice is that every record should contain at least one Type element that uses a value from the dcterms:DCMIType scheme to identify the nature or genre of the content of the resource.

  1. Recommended best practice is that every record for which it is applicable should contain at least one Type element that uses a value from the olac:linguistic-type scheme to identify its linguistic data type.


References

[DCMT]DCMI Metadata Terms.
<http://dublincore.org/documents/dcmi-terms/>
[OLAC-Language]OLAC Language Extension.
<http://www.language-archives.org/REC/language.html>
[OLAC-Metadata]OLAC Metadata.
<http://www.language-archives.org/OLAC/metadata.html>
[OLAC-Usage]OLAC Metadata Usage Guidelines.
<http://www.language-archives.org/NOTE/usage.html>