PanLex database snapshots

The PanLex project makes the content of its database available for local processing by publishing database table snapshots. Monthly snapshots are published in three formats: SQL, CSV, and JSON. A single zip file is published in each format, decompressing into a single directory.

The SQL snapshot directory contains a .sql file (to be loaded into PostgreSQL) and (to be loaded at PostgreSQL start-up via plperl.on_init).

The CSV and JSON snapshot directories contain one file per database table. The file is named for the database table name (for example, the source table is exported to a file named source.csv or source.json). Each entry in a file corresponds to a row in the table.

All of the data in all of the tables in the database that are useful to the public are included in the snapshots. Tables that contain implementation-specific and management information are excluded from the snapshots. Specifically, the following tables are included: definition, denotation, denotation_class, denotation_prop, expr, format, lang_code, langvar, langvar_char, langvar_cldr_char, meaning, meaning_class, meaning_prop, source, source_format, source_langvar, source_license.

Views with abbreviated table and column names (formerly the only available names) are provided in the SQL snapshot in the abbrev schema. The full table names above map to the following abbreviated names:df, dn, dcs, dpp, ex, fm, lc, lv, cp, cu, mn, mcs, mpp, ap, af, av, apli.

Leave a Reply