OLAC FAQ |
Some of the information below is borrowed from or based on the Dublin Core FAQ.
The Open Language Archives Community, or OLAC, is an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by: (i) developing consensus on best current practice for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources. OLAC was founded at the Workshop on Web-Based Language Documentation and Description, held in Philadelphia in December 2000.
A language resource is any kind of DATA, TOOL or ADVICE (see the founding vision statement) pertaining to the documentation, description or development of a human language. Texts, recordings, dictionaries, language learning materials, annotations, field notebooks, software, protocols, data models, file formats, newsgroup archives and web indexes are some examples of such resources. OLAC metadata can be used to describe any kind of language resource. Language resources may be digital or non-digital, published or restricted. In the OLAC context, a language archive is any collection of language resources and their resource descriptions.
The most familiar methods for language resource discovery are mailing lists, web indexes, and the catalogs of archives and publishers. Users of these methods typically experience low precision and recall: one has to wade through many irrelevant resources, and relevant resources are easily overlooked. OLAC seeks to improve that situation by developing an infrastructure for language resource discovery. A key part of that infrastructure is a metadata set that is specialized for language resource description.
The simplest definition of metadata is "structured data about data." Metadata is descriptive information about an object or resource whether it be physical or electronic. While the term metadata itself is relatively new, the underlying concepts behind metadata have been in use for as long as collections of information have been organized. Library card catalogs represent a well-established type of metadata; they have served as collection management and resource discovery tools for decades. Metadata can be generated either "by hand" or generated automatically using software.
The OLAC metadata set is the set of metadata elements that participating archives have agreed to use for describing language resources. Uniform description across archives is ensured by limiting the values of certain metadata elements to the use of terms from agreed-upon controlled vocabularies. The OLAC metadata set is equally applicable whether the resources are available online or not. The metadata set consists of the fifteen elements of the Dublin Core Metadata Set, plus the refinements and encoding schemes of the DCMI Metadata Terms—a widely accepted standard for describing resources of all types. To this general standard, OLAC adds encoding schemes that are designed specifically for describing language resources, such as subject language and linguistic data type. The OLAC Metadata Usage Guidelines describe (with examples) all the elements, refinements, and encoding schemes that may be used in OLAC metadata descriptions. The OLAC Metadata standard defines the XML format that is used for the interchange of metadata descriptions among participating archives.
OLAC metadata is collected (or "harvested") from participating archives every day using the metadata harvesting protocol of the Open Archives Initiative (OAI). End users can access the metadata using OLAC's search service, or by browsing OLAC's language pages (coming soon).
Anyone can use the OLAC Metadata Set to describe language resources. Language archives and language software repositories are the most common providers of OLAC metadata. (Participating archives and prospective participants are listed on the OLAC Organization page.) Individual researchers will soon be able to document the resources they manage using a simple form interface. The whole language resources community will be empowered by OLAC metadata, being able to quickly identify relevant resources.
OLAC is open in the sense that any archive can join, and any individual can access the metadata records of participating archives. Participation and access are free. Also, the process by which OLAC governs itself and makes decisions is visible to all community members and open to their participation. Open does not mean that users are free to do whatever they like with the metadata, nor does it mean that the described language resources are openly available.
OLAC is open to participation by any language archive. To participate, archives must set up an OAI "data provider", exporting their catalogs to the Dublin Core and OLAC metadata formats for harvesting by the OAI Metadata Harvesting Protocol, and then register with the OAI and OLAC. In general, the catalog remains in its existing format (e.g. in a relational database) and a CGI script permits external services to harvest the records in the prescribed XML format. A second approach avoids the OAI protocol altogether. An entire metadata repository is dumped in a single XML file, and the OLAC virtual data provider takes care of the rest. For more information about both options, please see the page: How to Become an OLAC Data Provider.
In OLAC parlance, standards refer to procedures and formats that govern participating services, such as the OLAC Metadata Set or the OAI Metadata Harvesting Protocol. In OLAC, so-called "language standards" like the TEI are viewed as a kind of language resource called ADVICE since they are not binding on participating archives. OLAC follows a process involving working groups and voting, whereby such advice can become identified as community-agreed best practice.
Please join the OLAC Metadata working group and post your proposal to the working group's mailing list.