Database design development

IntroductionUp

The PanLex database has undergone several incremental design revisions. Among the major ones have been:

  • The addition of language varieties to what were only languages.
  • The elaboration of the records of sources to include bibliographic, file-type, editor, language-variety, quality, complexity, and other attributes.
  • The consolidation of meaning identifiers, word classifications, domain specifications, and metadata into meaning and denotation classifications and properties.
  • The creation of source groups.
  • The addition of mutability and expressional default names to language varieties.

We expect the database design to continue to undergo occasional changes, and we welcome proposals for improvements.

Details

Potential changes to the database design that are under discussion include:

Source classifications and properties

Above we discuss classifications and properties in general. Additional details specific to source classifications and source properties are discussed here. As of June 02016, these changed had not been implemented.

An idea emerged in May 02015 to reorganize the structure of source records, making them more extensible, better able to accommodate multiple values per column, and more interoperable and/or linkable with other bibliographic records. One possible benefit would be to permit records to be imported from elsewhere, such as OCLC, Library of Congress (possibly under the Z39.50 standard), and Open Library.

This proposal has been accommodated with source classifications and properties. They generalize most of the content of table ap.

Tables

Classifications

Source classifications are stored in table acs. It has the structure

acs serial primary key
ap integer not null
ex0 integer
ex1 integer not null

and constraints

  • ap+ex0+ex1 unique
  • ap references ap(ap)
  • ex0 references ex(ex)
  • ex1 references ex(ex)

Table acs absorbed the li column of table ap (sources). For each record of table ap having a non-null value of li, a record was created in table acs, where ex0 is the expression “license” in the DCMI Metadata Terms language variety art-301, and the 2-character value of li was converted to the ID of an equivalent expression in either the SPDX License List Identifier language variety art-298 or the PanLex Intellectual Property Licenses language variety art-299 as the value of ex1.

Properties

Source properties are stored in table app. It has the structure:

app serial primary key
ap integer not null
ex integer not null
sq smallint not null
tt text not null

Constraints:

  • ap+ex+tt unique
  • ap references ap(ap)
  • ex references ex(ex)
  • ap+ex+sq unique

Table app absorbed columns ur, bn, au, ti, pb, yr, ul, ip, co, and ad of table ap (sources). Their ex values are the IDs of equivalent expressions in the DCMI Metadata Terms language variety, art-301. or the PanLex Source Classes and Properties language variety, art-305.

The sq column of table app identifies a sequential index. Sources often have multiple authors, and PanLex sources often have multiple titles. Some also have multiple publishers, multiple URLs, etc. Column sq specifies the order in which they appear.

Uses

Table ap contains each source’s ID, registration date, estimated quality, and group ID. You can use source classifications and properties to record other information about sources. This includes the usual bibliographic facts and PanLex-specific data.

We are not referring here to facts about the editorial assimilation of sources, such as the completion status of assimilation and problems facing the editors of particular sources. These editorial facts are stored in the aped table instead.