Language-variety ontologies

IntroductionUp

“Language variety” is a fundamental category in PanLex. We have enumerated about 11,500 language varieties. Several other systems for codifying language varieties also exist. We are researching the relationships between our classification and others’ classifications of language varieties.

External standards

There is no consensus on whether it is a good or a bad thing for the world to have multiple ways to classify language varieties. See, for example, an article by Haspelmath in 02013 saying that it is good. But there is a consensus that linking the existing standards to each other is good. Since the codes in these standards are attached to knowledge about language varieties, links among the standards also link the items of knowledge.

PanLex’s plans to link the language-variety standards began in 02012 as a result of a recommendation by PanLex Advisory Committee Steven Loomis of the ICU project to integrate PanLex UIDs into one of the other standards (BCP 47). Standards that we have identified as candidates for linkage with PanLex UIDs are:

The symbol “∈” above indicates that the PanLex language variety is one member of the set of language varieties represented by the standard’s code. In these examples, hye-001 represents Western Armenian, and the codes hy, arm, and hye represent Armenian, which includes Western Armenian.

Methods

These differences is scope illustrate the complexities in language-variety identification standards and in efforts to mapping multiple standards to each other (see, e.g., an essay by Musgrave in 02014 and a discussion in a forum on nabu in 02013).

The most obvious way to map PanLex UIDs to other standards’ codes for language varieties is translation. If we treat each standard as a concepticon, the standard’s codes are expressions and we can make sources translate the expressions of various standards into each other. Where the relationship is not equivalence, but rather part-whole or partial equivalence, we can represent that with meaning classifications. Sources that have translated between PanLex UIDs (i.e. art-274) and other language-variety standards include:

  • mul:OmegaWiki (WikiData)
  • mul:Melo (Lexvo)
  • art-eng-fra-spa:Hamm (Glottolog)

One task for database developers is to extend this practice by creating other sources with translations and meaning classifications that link PanLex UIDs with other standards.