Assimilation workflow

Up

A prerequisite for doing any assimilation work is to install the required tools.

The assimilation workflow, whether the strategy is interpretation or analysis, contains the same basic stages:

  1. Source selection and retrieval: choosing a source to work on, and retrieving its files.
  2. Text standardization: making a source’s content conform to PanLex text standards.
  3. Tabularization: transforming source data into a table where the rows represent PanLex meanings or multi-meaning entries, and the columns represent various details of each meaning. The table must end up as a plain-text tab-delimited file, but can be in another format (such as a spreadsheet) while it is being created.
  4. Serialization: transforming the output of tabularization into a file that can be imported into the PanLex database. This work also includes manipulations that improve the quality of the imported data.
  5. Importation: checking the serialized file, submitting it for inclusion in the PanLex database, and depositing the source directory (now containing additional files that you created) back into the PanLex resource archive.

Since serialization works with the output of tabularization, it always comes after tabularization. Likewise, importation always comes after serialization. Text standardization has several different aspects which can occur prior to tabularization, during tabularization, or during serialization.