Developing with the PanLex API

IntroductionUp

The experimental public API to the PanLex database may be accessed at  http://api.panlex.org. All API details are currently subject to change.

Existing applications

The following applications query the PanLex API:

  • Global Glossary: a web-based lexical translation reference
  • PanLex Translator: a Chrome browser extension that translates words that the user selects
  • TeraDict: a family of web-based applications translating expressions that the user enters
  • PanLinx: a tree terminating in millions of pages containing lexical translations, designed for search-engine crawling

Developer guide

Contents

HTTP protocol

The API uses the JSON format for all queries and responses. All queries take the form of HTTP POST requests. The type of query is specified in the URL. Additional query parameters are specified in the HTTP request body, which must be a valid JSON object. The empty object {} should be sent if there are no parameters.

There is, in fact, one situation in which HTTP GET requests are allowed (but not required): for queries which by design return a single database object. These have URLs of the form /<type>/<id> (e.g., /lv/eng-000 retrieves the language variety object for English). This feature is made available primarily for linked-data purposes. Query parameters cannot be passed over GET requests.

By default, the JSON response is not sent until all results have been collected on the server. Streaming responses are also available for queries that return a result array (see below). To activate streaming, set the Accept header to the type application/x-json-stream.

Successful API queries return HTTP status 200. Errors return HTTP status 4xx (the precise value depends on the nature of the error). A more specific description of the error may be found in the JSON response: the code field indicates the category of error, and the message field contains further information. Example code values are ResourceNotFoundError, BadMethodError, MissingParameterError, InvalidArgumentError, InvalidVersionError, and InternalError.

Limits

API users are requested not to perform more than 2 queries per second. The API server will enforce this rate if more than 100 queries are received in rapid succession from a single IP address, responding with HTTP status code 429.

Array parameters may not contain more than 10000 elements.

The returned result array will contain a maximum of 2000 elements.

The global parameter offset may not be greater than 250000.

Array parameters

For query parameters whose value may be an array, it is often the case that array would only have one element (e.g., there is only a single sort value or you are only searching on a single language variety). In such cases, it is possible to simply pass a string or number as appropriate. It will automatically be converted to a one-element array.

Global query and response parameters

The following optional query parameters are available in the HTTP request body:

  • after: integer or string corresponding to the value of the first sort field. Records will be returned that occur after the indicated value in the sort order. Can be used as an alternative to offset.
  • echo: boolean value indicating whether to pass the query back in the response as request, which is an object with the keys url and body. Defaults to false.
  • include: array specifying extra fields to include in the response. See documentation below for possible values.
  • indent: boolean value indicating whether to pretty-print the JSON response. Defaults to false.
  • limit: integer value indicating the maximum number of records to return. Defaults to resultMax, i.e., the maximum.
  • offset: integer value indicating how many records to omit from the beginning of the returned records. Defaults to 0; cannot be greater than 250000.
  • sort: array of fields to sort the result by. Sort strings take the format <field> or <field> asc for ascending order, <field> desc for descending order. You may also sort by include objects if there is only one per result by using a dot separator, e.g. lv.tt to sort by an expression’s language variety label. Defaults sorting by ID in ascending order.

The JSON response object can contain the following keys:

  • count: the number of results found, for count queries.
  • countType: string specifying the type of objects in count.
  • request: object representing the query, if echo was on.
  • result: array of result objects, if the query was for a set of objects. Limited to resultMax per query; use offset to get more.
  • resultMax: the maximum number of result objects that will be returned in a single query (currently 2000).
  • resultNum: number of objects returned in result.
  • resultType: string specifying type of objects in result.

Other keys are present for particular query types; see below.

URL parameters

When a query calls for a single language variety, expression, or source, this is passed as a URL parameter. URL parameters are indicated in the documentation below as <ap>, <df>, <dn>, <ex>, <lv>, and <mn>, and take the following format:

  • <ap>: source ID number or label.
  • <df>: definition ID number.
  • <dn>: denotation ID number.
  • <ex>: expression ID number.
  • <lv>: language variety ID number or uniform identifier (aaa-000).
  • <mn>: meaning ID number.

The database object corresponding to a URL parameter is always returned in the result object. For example, the object corresponding to <lv> will be returned as lv.

Examples

To retrieve information about all language varieties in PanLex, you can send the following query from the command line using curl:

$ curl http://api.panlex.org/lv -d '{ "indent": true }'

This query requires no additional parameters besides the URL. We could pass an empty JSON object {} in the request body, but for convenience, we set indent to true so the response will be pretty-printed. The response is structured as follows:

{
    "result": [
        {
          "lv": 1,
          "lc": "aar",
          "vc": 0,
          "ex": 1453510,
          "uid": "aar-000",
          "tt": "Qafár af"
        },
        ...
    ],
    "resultType": "lv",
    "resultNum": 2000,
    "resultMax": 2000
}

The result array contains a set of language variety objects, as indicated by resultType. resultNum indicates that 2000 results were returned, the maximum for a single query; we could do another query to get more results. (For an explanation of the language variety object, see below.)

Now suppose that we want to retrieve some expressions from Russian. We must first determine the language variety ID or uniform identifier for (a variety of) Russian. If we already know that the language code for Russian is rus, we can look up matching language varieties as follows:

curl http://api.panlex.org/lv -d '{ "lc": "rus" }'

The lc parameter says to search for language varieties with matching language codes. The value of lc is an array of three-character strings representing language codes. The results contain several language varieties; we pick this one as corresponding to Russian written in Cyrillic:

{
    "lv": 620,
    "lc": "rus",
    "vc": 0,
    "ex": 43116,
    "uid": "rus-000",
    "tt": "русский"
}

(We have omitted the indent parameter above for brevity, but will continue to use pretty-printed JSON for the purpose of these examples.)

Now that we know the language variety ID and uniform identifier for Russian (either will do), we can look up some Russian expressions. The following query looks up the expression “дерево” (the Russian word for “tree”):

curl http://api.panlex.org/ex -d '{ "uid": "rus-000", "tt": "дерево" }'

You will see that the result contains a single expression object with the ID 750865. If we want to know this expression’s denotations—i.e., what PanLex sources the expression occurs in, and what translations it is linked to in those sources—we can do a denotation query as follows:

curl http://api.panlex.org/dn -d '{ "ex": 750865 }'

The results now contain an array of denotation objects, one of which is the following:

{
    "dn": 25930350,
    "mn": 9537585,
    "ex": 750865,
    "ap": 603
}

If we want to get more information about the meaning to which this denotation belongs, we can look it up with the following query, specifying that definitions should be included:

curl http://api.panlex.org/mn -d '{ "mn": 9537585, "include": "df" }'

The result should contain an ex array with the expression IDs that share this meaning, and a df array containing one definition “растущий” in language variety 620 (which we have already determined is Russian).

Now suppose that we want to translate “дерево” into English. We can do this with an expression query requesting translations of expression 750865 into English:

curl http://api.panlex.org/ex -d '{ "uid": "eng-000", "trex": 750865 }'

You should see that the expression “tree” is one of the results, and the rest are expressions with closely related meanings.

Language variety queries

/lv

Returns information about a set of language varieties, as result array. Parameters:

  • extd: array of expression texts. Restricts results to language varieties containing a matching expression in degraded form.
  • extt: array of expression texts. Restricts results to language varieties containing a matching expression.
  • gl: array of Glottolog language codes.
  • include: valid values are cpcu, gl, lcType, and sc.
  • lc: array of language codes.
  • lv: array of language variety IDs.
  • trex: array of expression IDs. Restricts results to those language varieties containing a one-hop translation of one of the expressions.
  • tt: array of language variety labels.
  • uid: array of language variety uniform identifiers.

You can pass any combination of these parameters; results will be returned for matching language varieties. If you do not specify any search parameters, results will be returned for all language varieties in PanLex.

/lv/count

Returns the number of matching language varieties. Parameters are the same as for /lv.

/lv/<lv>

Returns information about a single language variety, as lv. The include parameter is the same as for /lv. There are no other parameters.

Language variety objects

Language variety objects contain the following keys:

  • cp: array of code point ranges (only if in include).
  • cu: array of exemplar character objects (only if in include).
  • ex: language variety label’s expression ID.
  • gl: Glottolog code for the language to which the language variety belongs (only if gl is in include).
  • glat: Glottolog-provided latitude (only if gl is in include).
  • glon: Glottolog-provided longitude (only if gl is in include).
  • lc: three-letter language code.
  • lcType: lc code type (only if in include); can be “ISO 639-3 individual language”, “ISO 639-3 macro-language”, “ISO 639-2 collective language”, “ISO 639-5”, or “other”.
  • lv: language variety ID number.
  • sc: array of names of Unicode scripts in which expressions in the language variety typically appear (only if in include).
  • tt: language variety label’s expression text.
  • uid: language variety’s uniform identifier.
  • vc: numeric variety code.

Code point ranges

The code point range is an array representing a range of permissible Unicode characters for a language variety. The array takes the form [first, last], where first is the numeric value of the first code point in the range and last is the value of the last code point in the range.

For example, for English (language variety eng-000), the first code point object is [32, 33]. This includes the range from U+0020 (SPACE) to U+0021 (EXCLAMATION MARK). Note that JSON numeric values are always decimal.

Exemplar character objects

Exemplar character objects represent the exemplar characters for a language variety, as defined by the Unicode Common Locale Data Repository. They contain the following keys:

  • category: character category, typically “pri” (primary/standard), “aux” (auxiliary), or “pun” (punctuation).
  • locale: Unicode script locale abbreviation.
  • range: a code point range (see above).

Expression queries

/ex

Returns information about the specified expressions, as result array. This is also the endpoint for translation queries. Parameters:

  • ex: array of expression IDs.
  • include: valid values are trlv, trpath, trq, trtd, trtt, truid, and uid.
  • lv: array of language variety IDs. Restricts results to expressions in the specified language varieties.
  • range: array of the form [field, start, end]. Restricts results to expressions whose field value is alphabetically between the start and end strings. field may be “tt” or “td”.
  • td: array of expression texts to be matched in their degraded form.
  • trdistance: integer specifying the number of translation hops. Pass 1 for one hop (direct or distance-1 translation), 2 for two hops (indirect or distance-2 translation). Defaults to 1. Only relevant if you are translating. Note that if you set this to 2, for performance reasons we recommend that you specify the source expression(s) with trex rather than one of the alternatives, and (if still not fast enough) the language variety with lv rather than uid.
  • trex: array of expression IDs. Restricts results to expressions that are translations of the specified expressions.
  • trlv: array of language variety IDs. Restricts results to expressions that are translations of expressions in the specified language varieties.
  • trqalgo: string specifying the translation quality algorithm. Valid values are “geometric” (the default) and “arithmetic”. See below for details. Only relevant when trdistance is 2.
  • trqmin: non-negative integer specifying a minimum translation quality. Translations with a lower quality will be discarded. Defaults to 0, i.e., no minumum. Only relevant if you are translating.
  • trtd: array of expression texts. Restricts results to expressions that are translations of expressions with matching texts in their degraded form.
  • trtt: array of expression texts. Restricts results to expressions that are translations of expressions with matching texts.
  • trui: array of source group IDs. Restricts translated results to those deriving from the specified source groups. Only relevant if you are translating.
  • truid: array of language variety uniform identifiers. Restricts results to expressions that are translations of expressions in the specified language varieties.
  • tt: array of expression texts.
  • uid: array of language variety uniform identifiers. Restricts results to expressions from the specified language varieties.

You must provide at least one of the parameters other than include. If you are translating, you must provide at least one of the trex, trtd, or trtt parameters. Results will be returned for all matching expressions.

/ex/count

Returns the number of matching expressions. Parameters are the same as for /ex, but there are no required parameters.

/ex/<ex>

Returns information about a single expression, as ex. The include parameter is the same as for /ex (but only uid makes sense here). There are no other parameters.

/ex/index

This query produces an alphabetically sorted index of expressions in the specified language varieties, or in all varieties in PanLex. Parameters:

  • lv: array of language variety IDs.
  • step: the number of expressions summarized in each index item. Required; minimum 250.

Expressions are first sorted by their degraded expression text, then divided into chunks of size step. The result is returned as the index array. Elements of index are arrays containing two expression objects each, representing the first and last expression from each index chunk.

Because this query can produce large responses, the indent parameter is ignored.

Expression objects

Expression objects contain the following keys:

  • ex: expression ID number.
  • lv: expression’s language variety ID number.
  • td: degraded expression text.
  • trex: ID number of expression from which the expression was translated (only if a translation parameter was specified in the query).
  • trlv: language variety ID for expression from which the expression was translated (only if specified in include and a translation parameter was specified in the query).
  • trpath: array of translation hop objects, one for each hop in the translation, in order from source to target (only if specified in include and a translation parameter was specified in the query). A translation hop consists of a meaning with a source and target denotation. Expressions tie hops together: a hop’s target denotation has the same expression as its following hop’s source denotation. Each translation hop object has the following keys: mn, containing the meaning ID; dn1, containing the source denotation ID; dn2, containing the target denotation ID; and (unless it is the final hop) ex2, containing the ID of the expression that ties the hop to the next one.
  • trq: translation quality score (only if specified in include and a translation parameter was specified in the query). For trdistance 1, it is the sum of the uq value of all sources from distinct source groups attesting the translation. The same algorithm is used for trdistance 2 when trqalgo is “arithmetic”, combining the sources from both hops for the purpose of the score. When trqalgo is “geometric” (the default), it is the sum, rounded to the nearest integer, of the geometric mean of each distinct translation path’s two uq values. Distinctness in this context is defined by the combination of the intermediate expression linking the two hops and the source groups of the two sources.
  • trtd: degraded text of expression from which the expression was translated (only if specified in include and a translation parameter was specified in the query).
  • trtt: text of expression from which the expression was translated (only if specified in include and a translation parameter was specified in the query).
  • truid: language variety uniform identifier for expression from which the expression was translated (only if specified in include and a translation parameter was specified in the query).
  • tt: expression text.
  • uid: expression’s language variety uniform identifier (only if in include).

Denotation queries

/dn

Returns information about the specified denotations, as result array. Parameters:

  • ap: array of source IDs.
  • dn: array of denotation IDs.
  • ex: array of expression IDs.
  • include: valid values are dcs and dpp.
  • mn: array of meaning IDs.

You must provide at least one of the ap, dnex, or mn parameters. Results will be returned for all matching expressions.

/dn/count

Returns the number of matching denotations. Parameters are the same as for /dn, but there are no required parameters.

/dn/<dn>

Returns information about a single denotation, as dn. There are no parameters.

Denotation objects

Denotation objects contain the following keys:

  • ap: source ID number.
  • dcs: array of denotation classifications (only if in include). Each denotation classification is a two-element array consisting of the superclass expression ID and the class expression ID.
  • dn: denotation ID number.
  • dpp: array of denotation properties (only if in include). Each denotation property is a two-element array consisting of the attribute expression ID and the property string.
  • ex: expression ID number.
  • mn: meaning ID number.

Meaning queries

/mn

Returns information about a set of meanings, as result array. Parameters:

  • ap: array of source IDs. Restricts results to meanings from the specified sources.
  • ex: array of expression IDs. Restricts results to meanings containing all of the specified expressions.
  • include: valid values are df, mcs, and mpp.
  • mn: array of meaning IDs.

You must provide at least one of the ap, ex, or mn parameters. Results will be returned for all matching meanings.

/mn/count

Returns the number of matching meanings. Parameters are the same as for /mn, but there are no required parameters.

/mn/<mn>

Returns information about a single meaning, as mn. The include parameter is the same as for /mn. There are no other parameters.

Meaning objects

Meaning objects contain the following keys:

  • ap: source ID number.
  • df: array of definition objects (only if in include). Definition objects are the same as for definition queries (see below), with the mn key omitted.
  • dn: array of IDs of denotations of the meaning.
  • ex: array of IDs of expressions with the meaning.
  • mcs: array of meaning classifications (only if in include). Each meaning classification is a two-element array consisting of the superclass expression ID and the class expression ID.
  • mn: meaning ID number.
  • mpp: array of meaning properties (only if in include). Each meaning property is a two-element array consisting of the attribute expression ID and the property string.

Definition queries

/df

Returns information about a set of definitions, as result array. Parameters:

  • df: array of definition IDs.
  • ex: array of expression IDs. Restricts results to definitions of meanings of the specified expressions.
  • exlv: array of language variety IDs. Restricts results to definitions of meanings of expressions in the specified language varieties.
  • extd: array of expression texts. Restricts results to definitions of meanings of expressions with matching texts in their degraded form.
  • extt: array of expression texts. Restricts results to definitions of meanings of expressions with matching texts.
  • exuid: array of language variety uniform identifiers. Restricts results to definitions of meanings of expressions in the specified language varieties.
  • include: valid values are exlv, extd, extt, exuid, and uid.
  • lv: array of language variety IDs. Restricts results to definitions in the specified language varieties.
  • mn: array of meaning IDs. Restricts results to definitions of the specified meanings.
  • td: array of definition texts to be matched in their degraded form.
  • tt: array of definition texts.
  • uid: array of language variety uniform identifiers. Restricts results to definitions in the specified language varieties.

You must provide at least one parameter (other than include). Results will be returned for all matching definitions.

/df/count

Returns the number of matching definitions. Parameters are the same as for /df, but there are no required parameters.

/df/<df>

Returns information about a single definition, as df. There are no parameters.

Definition objects

Definition objects contain the following keys:

  • df: ID number of the definition.
  • ex: ID number of the expression with which the definition shares a meaning (only if one of the ex parameters was specified in the query).
  • exlv: language variety ID of the expression whose meaning is defined (only if in include, and one of the ex parameters was specified in the query).
  • extd: degraded text of the expression whose meaning is defined (only if in include, and one of the ex parameters was specified in the query).
  • extt: text of the expression whose meaning is defined (only if in include, and one of the ex parameters was specified in the query).
  • exuid: language variety uniform identifier of the expression whose meaning is defined (only if in include, and one of the ex parameters was specified in the query).
  • lv: ID number of the language variety in which the definition is written.
  • mn: ID number of the meaning to which the definition belongs.
  • td: degraded text of the definition.
  • tt: text of the definition.
  • uid: uniform identifier of the language variety in which the definition is written (only if in include).

Source queries

/ap

Returns information about the specified sources, as result array. Parameters:

  • ap: array of source IDs.
  • ex: array of expression IDs. Restricts results to sources containing all of the specified expressions, whether in the same meaning or not.
  • include: valid values are lv and mn.
  • lv: array of language variety IDs. Restricts results to sources with those declared language varieties.
  • mn: boolean value. Restricts results to sources with one or more meanings (if true) or no meanings (if false).
  • trex: array of expression IDs. Restricts results to sources with at least one meaning that contains all of the specified expressions.
  • tt: array of source labels.
  • ui: array of source group IDs.
  • uid: array of language variety uniform identifiers. Restricts results to sources with those declared language varieties.

Results will be returned for all matching sources. If you do not specify a search parameter, results will be returned for all sources in PanLex. You cannot specify ex and trex simultaneously.

/ap/count

Returns the number of matching sources. Parameters are the same as for /ap.

/ap/<ap>

Returns information about a single source, as ap. The include parameter is the same as for /ap. There are no other parameters.

Source objects

Source objects contain the following keys:

  • ap: source ID number.
  • au: author(s).
  • bn: ISBN number.
  • dt: date added to PanLex.
  • ip: intellectual property claim, if known.
  • li: license type; can be “copyright”, “Creative Commons”, “GNU Free Documentation License”, “GNU General Public License”, “GNU Lesser General Public License”, “MIT License”, “other”, “PanLex Use Permission”, “public domain”, “request”, or “unknown”.
  • lv: array of IDs of language varieties declared as documented in the source (only if in include).
  • mn: boolean value indicating whether the source has any meanings (only if in include).
  • ti: title.
  • tt: label.
  • ui: ID of source group to which the source belongs.
  • ul: miscellaneous notes.
  • uq: quality rating assigned by PanLex editor (0 = lowest, 9 = highest).
  • ur: URL.
  • yr: year of publication.

Normalization queries

/norm/ex/<lv>

Returns normalization scores and normalized texts for a set of expression texts in a language variety, as norm. Parameters:

  • degrade: boolean value indicating whether to compare the degraded text of each value in tt against the degraded text of existing expressions in PanLex. Defaults to false.
  • tt: array of expression texts to normalize.
  • ui: array of source group IDs. Meanings from these source groups will be ignored when calculating scores. Defaults to an empty array.

The returned norm object maps each expression text, as a key, to an object (when degrade is false) or an array of objects (when degrade is true) containing normalization information. The object or objects’ score key contains the expression text’s normalization score. This is the sum of the quality ratings (uq) of the sources of the expression’s denotations. (Multiple sources from the same source group are counted as a single attestation for this purpose.) Thus, the more sources attest the existence of an expression, the higher its score, but the score is weighted by source quality. If no expression exists with the corresponding text, the score will be zero.

When the degrade option is used, the returned array of objects contains scores for all expressions whose degraded texts (their td values) match the degraded texts of the supplied tt values. The objects’ tt key contains each expression’s text. The array is sorted by score in descending order.

/norm/df/<lv>

Returns normalization scores and normalized texts for a set of definition texts in a language variety, as norm. Parameters:

  • degrade: boolean value indicating whether to compare the degraded text of each value in tt against the degraded text of existing definitions in PanLex. Defaults to false.
  • tt: array of definition texts to normalize.
  • ui: array of source group IDs. Meanings from these source groups will be ignored when calculating scores. Defaults to an empty array.

The returned norm object maps each expression text, as a key, to an object or array of objects containing normalization information. Its format and the algorithm used are the same as for expression normalization (see above).

Text degradation queries

/td

Returns degraded texts for arbitrary input texts, as td. Parameters:

  • tt: array of texts to degrade.

The returned td object maps each input text (as a key) to its degraded text.

Clients

PanLex API clients are available for node.js, Perl (as part of the PanLex tools), and Ruby.

2 thoughts on “Developing with the PanLex API”

  1. Hi Howard,

    The extt parameter you cited is for restricting the language varieties returned, based on whether they contain a particular expression or not. If you want to translate an expression, you should use the /ex endpoint and the trex parameter. Hope this helps.

  2. documentation not clear. I want to return just the translation of a word into the target language, not “related expressions”. You mention

    extt: array of expression texts. Restricts results to language varieties containing a matching expression.

    but it is not clear how to use this extt value.

    Please explain. thanks.

Comments are closed.