API

The experimental public API to the PanLex database may be accessed at  http://api.panlex.org. All API details are currently subject to change.

Read a brief overview of the PanLex API.

Contents

HTTP protocol

The API uses the JSON format for all queries and responses. All queries take the form of HTTP POST requests. The type of query is specified in the URL. Additional query parameters are specified in the HTTP request body, which must be a valid JSON object. The empty object {} should be sent if there are no parameters.

Successful API queries return HTTP status 200. Errors return HTTP status 4xx (the precise value depends on the nature of the error). A more specific description of the error may be found in the JSON response: the code field indicates the category of error, and the message field contains further information. Example code values are ResourceNotFoundError, BadMethodError, MissingParameterError, InvalidArgumentError, InvalidVersionError, and InternalError.

Limits

API users are requested not to perform more than 2 queries per second. The API server will enforce this rate if more than 100 queries are received in rapid succession from a single IP address, responding with HTTP status code 429.

Array parameters may not contain more than 10000 elements.

The returned result array will contain a maximum of 2000 elements.

The global parameter offset may not be greater than 250000.

Global query and response parameters

The following optional query parameters are available in the HTTP request body:

  • echo: boolean value indicating whether to pass the query back in the response as request, which is an object with the keys url and body. Defaults to false.
  • include: string or array specifying extra fields to include in the response. See documentation below for possible values.
  • indent: boolean value indicating whether to pretty-print the JSON response. Defaults to false.
  • limit: numeric value indicating the maximum number of records to return. Defaults to resultMax, i.e., the maximum.
  • offset: numeric value indicating how many records to omit from the beginning of the returned records. Defaults to 0; cannot be more than 250000.
  • sort: string or array of fields to sort the result by. Sort strings take the format <field> or <field> asc for ascending order, <field> desc for descending order. You may also sort by include objects if there is only one per result by using a dot separator, e.g. lv.tt to sort by an expression’s language variety label. Defaults sorting by ID in ascending order.

The JSON response object can contain the following keys:

  • count: the number of results found, for count queries.
  • countType: string specifying the type of objects in count.
  • request: object representing the query, if echo was on.
  • result: array of result objects, if the query was for a set of objects. Limited to resultMax per query; use offset to get more.
  • resultMax: the maximum number of result objects that will be returned in a single query (currently 2000).
  • resultNum: number of objects returned in result.
  • resultType: string specifying type of objects in result.

Other keys are present for particular query types; see below.

URL parameters

When a query calls for a single language variety, expression, or source, this is passed as a URL parameter. URL parameters are indicated in the documentation below as <ap>, <dn>, <ex>, <lv>, and <mn>, and take the following format:

  • <ap>: source ID number or label.
  • <dn>: denotation ID number.
  • <ex>: expression ID number.
  • <lv>: language variety ID number or uniform identifier (aaa-000).
  • <mn>: meaning ID number.

Examples

To retrieve information about all language varieties in PanLex, you can send the following query from the command line using curl:

$ curl http://api.panlex.org/lv -d '{ "indent": true }'

This query requires no additional parameters besides the URL. We could pass an empty JSON object {} in the request body, but for convenience, we set indent to true so the response will be pretty-printed. The response is structured as follows:

{
    "result": [
        {
          "lv": 1,
          "lc": "aar",
          "vc": 0,
          "ex": 1453510,
          "uid": "aar-000",
          "tt": "Qafár af"
        },
        ...
    ],
    "resultType": "lv",
    "resultNum": 2000,
    "resultMax": 2000
}

The result array contains a set of language variety objects, as indicated by resultType. resultNum indicates that 2000 results were returned, the maximum for a single query; we could do another query to get more results. (For an explanation of the language variety object, see below.)

Now suppose that we want to retrieve some expressions from Russian. We must first determine the language variety ID or uniform identifier for (a variety of) Russian. If we already know that the language code for Russian is rus, we can look up matching language varieties as follows:

curl http://api.panlex.org/lv -d '{ "lc": ["rus"] }'

The lc parameter says to search for language varieties with matching language codes. The value of lc is an array of three-character strings representing language codes. The results contain several language varieties; we pick this one as corresponding to Russian written in Cyrillic:

{
    "lv": 620,
    "lc": "rus",
    "vc": 0,
    "ex": 43116,
    "uid": "rus-000",
    "tt": "русский"
}

(We have omitted the indent parameter above for brevity, but will continue to use pretty-printed JSON for the purpose of these examples.)

Now that we know the language variety ID and uniform identifier for Russian (either will do), we can look up some Russian expressions. The following query looks up the expression “дерево” (the Russian word for “tree”):

curl http://api.panlex.org/ex -d '{ "uid": ["rus-000"], "tt": ["дерево"] }'

You will see that the result contains a single expression object with the ID 750865. If we want to know this expression’s denotations—i.e., what PanLex sources the expression occurs in, and what translations it is linked to in those sources—we can do a denotation query as follows:

curl http://api.panlex.org/dn -d '{ "ex": [750865] }'

The results now contain an array of denotation objects, one of which is the following:

{
    "dn": 25930350,
    "mn": 9537585,
    "ex": 750865,
    "ap": 603,
    "wc": "noun"
}

If we want to get more information about the meaning to which this denotation belongs, we can look it up with the following query, specifying that definitions should be included:

curl http://api.panlex.org/mn -d '{ "mn": [9537585], "include": "df" }'

The result should contain an ex array with the expression IDs that share this meaning, and a df array containing one definition “растущий” in language variety 620 (which we have already determined is Russian).

Now suppose that we want to translate “дерево” into English. We can do this with an expression query requesting translations of expression 750865 into English:

curl http://api.panlex.org/ex -d '{ "uid": ["eng-000"], "trex": [750865] }'

You should see that the expression “tree” is one of the results, and the rest are expressions with closely related meanings.

Language variety queries

/lv

Returns information about a set of language varieties, as result array. Parameters:

  • extd: array of expression texts. Restricts results to language varieties containing a matching expression in degraded form.
  • extt: array of expression texts. Restricts results to language varieties containing a matching expression.
  • include: valid values are cpcu, and lcType.
  • lc: array of language codes.
  • lv: array of language variety IDs.
  • trex: array of expression IDs. Restricts results to those language varieties containing a one-hop translation of one of the expressions.
  • tt: array of language variety labels.
  • uid: array of language variety uniform identifiers.

You can pass any combination of these parameters; results will be returned for matching language varieties. If you do not specify any search parameters, results will be returned for all language varieties in PanLex.

/lv/count

Returns the number of matching language varieties. Parameters are the same as for /lv.

/lv/<lv>

Returns information about a single language variety, as lv. The include parameter is the same as for /lv. There are no other parameters.

Language variety objects

Language variety objects contain the following keys:

  • cp: array of code point ranges (only if in include).
  • cu: array of exemplar character objects (only if in include).
  • ex: language variety label’s expression ID.
  • lc: three-letter language code.
  • lcType: lc code type (only if in include); can be “ISO 639-3 individual language”, “ISO 639-3 macro-language”, “ISO 639-2 collective language”, “ISO 639-5″, or “other”.
  • lv: language variety ID number.
  • tt: language variety label’s expression text.
  • uid: language variety’s uniform identifier.
  • vc: numeric variety code.

Code point ranges

The code point range is an array representing a range of permissible Unicode characters for a language variety. The array takes the form [first, last], where first is the numeric value of the first code point in the range and last is the value of the last code point in the range.

For example, for English (language variety eng-000), the first code point object is [32, 33]. This includes the range from U+0020 (SPACE) to U+0021 (EXCLAMATION MARK). Note that JSON numeric values are always decimal.

Exemplar character objects

Exemplar character objects represent the exemplar characters for a language variety, as defined by the Unicode Common Locale Data Repository. They contain the following keys:

  • category: character category, typically “pri” (primary/standard), “aux” (auxiliary), or “pun” (punctuation).
  • locale: Unicode script locale abbreviation.
  • range: a code point range (see above).

Expression queries

/ex

Returns information about the specified expressions, as result array. Parameters:

  • ex: array of expression IDs.
  • include: valid values are lv, trlv, trq, trtd, trtt, and truid.
  • lv: array of language variety IDs. Restricts results to expressions from the specified language varieties.
  • range: array of the form [field, start, end]. Restricts results to expressions whose field value is alphabetically between the start and end strings. field may be “tt” or “td”.
  • td: array of expression texts to be matched in their degraded form.
  • trex: array of expression IDs. Restricts results to expressions that are one-hop translations from the specified expressions.
  • trlv: array of language variety IDs. Restricts results to expressions that are one-hop translations from expressions in the specified language varieties.
  • trtd: array of expression texts. Restricts results to expressions that are one-hop translations from expressions with matching texts in their degraded form.
  • trtt: array of expression texts. Restricts results to expressions that are one-hop translations from expressions with matching texts.
  • truid: array of language variety uniform identifiers. Restricts results to expressions that are one-hop translations from expressions in the specified language varieties.
  • tt: array of expression texts.
  • uid: array of language variety uniform identifiers. Restricts results to expressions from the specified language varieties.

You must provide at least one of the parameters other than include. If you are translating, you must provide at least one of the trex, trtd, or trtt parameters. Results will be returned for all matching expressions.

/ex/count

Returns the number of matching expressions. Parameters are the same as for /ex, but there are no required parameters.

/ex/<ex>

Returns information about a single expression, as ex. The include parameter is the same as for /ex. There are no other parameters.

/ex/index

This query produces an alphabetically sorted index of expressions in the specified language varieties, or in all varieties in PanLex. Parameters:

  • lv: array of language variety IDs.
  • step: the number of expressions summarized in each index item. Required; minimum 250.

Expressions are first sorted by their degraded expression text, then divided into chunks of size step. The result is returned as the index array. Elements of index are arrays containing two expression objects each, representing the first and last expression from each index chunk.

Because this query can produce large responses, the indent parameter is ignored.

Expression objects

Expression objects contain the following keys:

  • ex: expression ID number.
  • lv: expression’s language variety ID number, or (if specified in include) the full language variety object.
  • td: degraded expression text.
  • tt: expression text.
  • trex: ID number of expression from which the expression was translated (only if a translation parameter was specified in the query).
  • trlv: language variety ID for expression from which the expression was translated (only if specified in include and a translation parameter was specified in the query).
  • trq: translation quality, calculated by summing the uq value of all sources from distinct source groups attesting the translation (only if specified in include and a translation parameter was specified in the query).
  • trtd: degraded text of expression from which the expression was translated (only if specified in include and a translation parameter was specified in the query).
  • trtt: text of expression from which the expression was translated (only if specified in include and a translation parameter was specified in the query).
  • truid: language variety uniform identifier for expression from which the expression was translated (only if specified in include and a translation parameter was specified in the query).

Denotation queries

/dn

Returns information about the specified denotations, as result array. Parameters:

  • ap: array of source IDs.
  • dn: array of denotation IDs.
  • ex: array of expression IDs.
  • include: valid value is md.
  • mn: array of meaning IDs.
  • wc: array of word classes. Restricts results to denotations with the specified word classes.

You must provide at least one of the ap, dnex, or mn parameters. Results will be returned for all matching expressions.

/dn/count

Returns the number of matching denotations. Parameters are the same as for /dn, but there are no required parameters.

/dn/<dn>

Returns information about a single denotation, as dn. There are no parameters.

Denotation objects

Denotation objects contain the following keys:

  • ap: source ID number.
  • dn: denotation ID number.
  • ex: expression ID number.
  • md: array of metadata objects (only if in include). Metadata objects contain two keys: key (the type of metadata) and value (its value).
  • mn: meaning ID number.
  • wc: word class.

Meaning queries

/mn

Returns information about a set of meanings, as result array. Parameters:

  • ap: array of source IDs. Restricts results to meanings from the specified sources.
  • ex: array of expression IDs. Restricts results to meanings containing all of the specified expressions.
  • include: valid values are mi, df, and dm.
  • mn: array of meaning IDs.

You must provide at least one of the ap, ex, or mn parameters. Results will be returned for all matching meanings.

/mn/count

Returns the number of matching meanings. Parameters are the same as for /mn, but there are no required parameters.

/mn/<mn>

Returns information about a single meaning, as mn. The include parameter is the same as for /mn. There are no other parameters.

Meaning objects

Meaning objects contain the following keys:

  • ap: source ID number.
  • df: array of definition objects (only if in include).
  • dm: array of domain expression objects (only if in include).
  • ex: array of IDs of expressions with the meaning.
  • mi: meaning identifier string (only if in include).
  • mn: meaning ID number.

Definition objects

Definition objects contain the following keys:

  • lv: ID number of the language variety in which the definition is written.
  • tt: text of the definition.

Source queries

/ap

Returns information about the specified sources, as result array. Parameters:

  • ap: array of source IDs.
  • ex: array of expression IDs. Restricts results to sources containing all of the specified expressions, whether in the same meaning or not.
  • include: valid value is lv.
  • lv: array of language variety IDs. Restricts results to sources with those declared language varieties.
  • trex: array of expression IDs. Restricts results to sources with at least one meaning that contains all of the specified expressions.
  • tt: array of source labels.
  • ui: array of source group IDs.
  • uid: array of language variety uniform identifiers. Restricts results to sources with those declared language varieties.

Results will be returned for all matching sources. If you do not specify a search parameter, results will be returned for all sources in PanLex.

/ap/count

Returns the number of matching sources. Parameters are the same as for /ap.

/ap/<ap>

Returns information about a single source, as ap. The include parameter is the same as for /ap. There are no other parameters.

Source objects

Source objects contain the following keys:

  • ap: source ID number.
  • au: author(s).
  • bn: ISBN number.
  • dt: date added to PanLex.
  • lv: array of IDs of language varieties declared as documented in the source (only if in include).
  • ti: title.
  • tt: label.
  • ui: ID of source group to which the source belongs.
  • ul: miscellaneous notes.
  • uq: quality rating assigned by PanLex editor (0 = lowest, 9 = highest).
  • ur: URL.
  • yr: year of publication.

Normalization queries

/norm/<lv>

Returns normalization scores and normalized texts for a set of expression texts in a language variety, as norm. Parameters:

  • ap: array of source IDs. Denotations from these sources will be ignored during normalization. Defaults to an empty array.
  • degrade: boolean value indicating whether to compare the degraded text of each value in tt against the degraded text of existing expressions in PanLex. Defaults to false.
  • tt: array of expression texts to normalize.

The returned norm object maps each expression text (as a key) to an object containing normalization information. The score key in this object contains an expression text’s normalization score. This is the sum of the quality ratings (uq) of the sources of the expression’s denotations. (Multiple sources from the same source group are counted as a single attestation for this purpose.) Thus, the more sources attest the existence of an expression, the higher its score, but the score is weighted by source quality. If no expression exists with the corresponding text, the score will be zero.

When the degrade option is used, the normalization score is produced on the basis of expressions whose degraded texts (their td values) match the degraded texts of the supplied tt values. The highest matching score is returned under score, and the text of the expression with the highest matching score is returned under ttNorm.

Practice exercises

Practice accessing the database using the API by finding the results of the following queries. Test your solutions at http://www.hurl.it.

  1. Look up language variety 157.
  2. Look up expressions 377383 and 377474 and their language variety details, in one query.
  3. Look up all language varieties with language code arb.
  4. Look up the total number of expressions in PanLex.
  5. Look up all expressions in PanLex in the alphabetical range from “mason” to “meson”, case sensitive, with results in alphabetical order.

2 thoughts on “API”

  1. documentation not clear. I want to return just the translation of a word into the target language, not “related expressions”. You mention

    extt: array of expression texts. Restricts results to language varieties containing a matching expression.

    but it is not clear how to use this extt value.

    Please explain. thanks.

  2. Hi Howard,

    The extt parameter you cited is for restricting the language varieties returned, based on whether they contain a particular expression or not. If you want to translate an expression, you should use the /ex endpoint and the trex parameter. Hope this helps.

Comments are closed.