Serialization

IntroductionUp

Assimilation normally involves two stages: tabularization and serialization. There is more information on the motivation for serialization.

If you are doing interpretation, you can produce a serial file directly rather than by serializing a tabular file. Usually, however, it is more efficient to create a tabular file and then serialize it with the process described here, or, if you don’t have the knowledge to do that, get help from an analyst in serializing the file.

The serialization process makes use of PanLex tools. Typically, when you perform serialization you use existing serialization scripts with arguments that you define; you don’t need to write your own scripts. This usual process is summarized below. In rare cases, it is more efficient to use a custom process to produce the desired output.

Editorial judgment

The serialization process is not just mechanical. You must exercise judgment in choosing scripts and defining their arguments. Your judgment will, for example, classify words and phrases as expressions versus definitions and cause expressions to be retained as-is or modified to conform to more common spellings.

Steps

In the serialization process, you use a command-line terminal to run the serialize.pl Perl script from the PanLex tools.

  1. Make the directory that contains the source files—your working directory—the current directory for any commands you will issue.
  2. Copy serialize.pl to your working directory by entering the following command: plx cp serialize.pl.
  3. In a text editor, open the serialize.pl file.
  4. Make the basename of the source file and the version number match the tabularization file that will be the input. If the file name were aaa-bbb-Author-1.txt, then the basename would be aaa-bbb-Author and the version number would be 1.
  5. Uncomment (i.e. remove the leading # symbols on) the lines that call the serialization scripts you want to use, amend their arguments to fit your source and your purposes, and duplicate and/or reorder the lines as necessary. The bare minimum set of scripts to be called consists of extag and out-full-0. There are detailed instructions for this step.
  6. Execute serialize.pl by running the command perl serialize.pl or ./serialize.pl.
  7. Inspect the files produced by this action. They include intermediate files with different version numbers, log files, and a final source file (whose version is final rather than a number). The diff command can help you perform inspections. For example, diff -u *-7.txt *-8.txt compares versions 7 and 8.
  8. If you discover errors or decide to improve how you specified the serialization process, amend serialize.pl and execute it again.
  9. If PanLem reports errors when you submit the final source file, amend serialize.pl, execute it again, and submit the new final source file.