IntroductionUp
The serialization stage of assimilation produces a file that we name a final source file. It contains data and also instructions to PanLem for the importation of the data.
The instructions and data are located on distinct lines of the file. Each line contains only one datum or one instructional item.
Example
We showed you an example of a tabular file created from a Spanish–Zapotec dictionary source.
Serialization converts a file like that to a final source file, which looks like this:
: 0 mn dn spa-000 astutamente dcs2 art-303 PartOfSpeechProperty art-303 Adverbial dn zpq-000 maños mn dn spa-000 astuto dcs2 art-303 PartOfSpeechProperty art-303 Adjectival dn zpq-000 maños mn dn spa-000 asustar dcs2 art-303 PartOfSpeechProperty art-303 TransitiveVerb dn zpq-000 chšeb mn dn spa-000 asustar dcs2 art-303 PartOfSpeechProperty art-303 Verbal dcs2 art-303 MorphosyntacticProperty art-302 REFL dn zpq-000 chžeb mn df spa-000 ataque (epiléptico) dn spa-000 ataque dcs2 art-303 PartOfSpeechProperty art-303 CommonNoun dcs2 art-303 GenderProperty art-303 MasculineGender dn zpq-000 šon mn dn spa-000 atar dcs2 art-303 PartOfSpeechProperty art-303 TransitiveVerb dn zpq-000 chc̱hej dn zpq-000 chda’ yag mn df spa-000 (estar) atado dn spa-000 etado dn zpq-000 chc̱hej dn zpq-000 chda’ yag mn dn spa-000 atarantado dcs2 art-303 PartOfSpeechProperty art-303 Adjectival dn zpq-000 tarantadw mn dn spa-000 atarantarse dn zpq-000 chec̱hol chenite mn dn spa-000 atardecer dcs2 art-303 PartOfSpeechProperty art-303 IntransitiveVerb dn zpq-000 chex̱jw gwbiž dn zpq-000 chex̱jwža mn dn spa-000 atascarse dn zpq-000 chaga’ mn df spa-000 atascarse (sin poder orinar o defecar) dn spa-000 atascarse dn zpq-000 cheyjw mn dn spa-000 ataúd dcs2 art-303 PartOfSpeechProperty art-303 CommonNoun dcs2 art-303 GenderProperty art-303 MasculineGender dn zpq-000 yi’iṉ mn df spa-000 atender (tomar en serio) dn spa-000 atender dcs2 art-303 PartOfSpeechProperty art-303 TransitiveVerb dn zpq-000 chonen c̱he dn zpq-000 chzi’ c̱he‣chzi’ diža’ dn zpq-000 chejḻe’ mn dn spa-000 atrás dcs2 art-303 PartOfSpeechProperty art-303 Adverbial dn zpq-000 trasle mn dn spa-000 atrasado dcs2 art-303 PartOfSpeechProperty art-303 Adjectival dn zpq-000 trasadw mn dn spa-000 atravesar dcs2 art-303 PartOfSpeechProperty art-303 TransitiveVerb dn zpq-000 chḻaga’ dn zpq-000 chde mn dn spa-000 atreverse dcs2 art-303 PartOfSpeechProperty art-303 Verbal dcs2 art-303 MorphosyntacticProperty art-302 REFL dn zpq-000 cheyaxje mn dn spa-000 atrevido dcs2 art-303 PartOfSpeechProperty art-303 Adjectival dn zpq-000 chogwlaz
If you compare them, you can see that the same information appears in both, except that it is more specific in the final source file. For example, the final source file makes explicit that “astutamente” is an expression and is in Spanish. A line in the tabular file is converted, typically, into a set of lines in the final source file.
Syntax
A final source file must comply with a syntax that PanLem can parse.
You can think of a final source file as containing a set of specifications for meanings (“mn”). Within each meaning specification, there are specifications for one or more meaning details. Meaning classifications, meaning properties, definitions, and denotations are all meaning details. In turn, denotations have their own denotation details. These are denotation classifications and denotation properties.
Each specification of a detail contains 3 or more lines. The first line specifies the detail type. The lines are the following:
mcs1
(unary meaning classification): 1 expression specification, consisting of 1 line containing an expression’s language variety’s UID and 1 line containing the expression’s textmcs2
(binary meaning classification): 2 expression specifications, each consisting of 1 line containing an expression’s language variety’s UID and 1 line containing the expression’s textmpp
(meaning property): 1 expression specification (as inmcs1
andmcs2
) and 1 line containing a text.df
(definition): 1 line containing the UID of the language variety of the definition, and 1 line containing the definition’s textdn
(denotation): 1 expression specification (as inmcs1
andmcs2
)dcs1
(unary denotation classification): same asmcs1
dcs2
(binary denotation classification): same asmcs2
dpp
(denotation property): same asmpp
The example file shown above contains blank lines. Those are permitted but not required. If you want them, you may insert them anywhere in the file except within a detail.
Leading and trailing whitespace is stripped on all lines, so it is possible to indent lines to make the logical structure more clear, as in the example above.
Final source files are text files with UTF-8 encoding. The lines all end with the line-feed (LF
) character (a Unix or OS X line break, U+000A
).
You should configure your source-analysis environment so that your system writes final source files with LF
line breaks. If you use the PanLex tools, this should be done for you automatically on all platforms.