Why serialize?


Serialization produces a file that is ready for PanLem to import into the PanLex database. It also includes actions that improve the quality of the imported data.

This page provides an understanding of some non-obvious aspects of serialization. For documentation on how to perform serialization, see the pages on the serialization process and serialization scripts.

Batch importation is really serial

When you assimilate data from a source, you can add your resulting data to the database one item at a time or all at once. PanLem lets you add items one at a time with the meaning—new feature. Alternatively, it lets you submit many facts at once with the file—submit feature.

In reality, however, submitting a file “all at once” gives the file’s data to PanLem one line at a time, and PanLem checks each line as it is received. What PanLem does as it processes a line can depend on the previous lines in the same file. For example, a new expression causes PanLem to create an expression in the database, but if that same expression appears later in the file it is no longer new, so the just-created expression is used again for the new line.

In that sense, what appears to be a batch process is really a serial one. The serialization process produces a file that contains a series of items of information, permitting PanLem to check and import them one at a time.

If there is an error anywhere in a submitted file, PanLem stops and reports the error to you. If you were trying to have PanLem import the file (rather than merely check it), PanLem rolls back the importation transaction, so the previously imported items are actually not imported.

Serialization does more than serialize

In the serialization stage you can really do more that just serialize. You can also perform some quality improvements on the tabular data before they get serialized. Among these improvements are:

  • Performing additional text standardization
  • Reclassifying implausible “expressions” as definitions
  • Standardizing the orthography of expressions
  • Standardizing grammatical classifications
  • Classifying multiple translations of an expression as either synonyms sharing a single meaning or translations having distinct meanings