Final source files

IntroductionUp

The serialization stage of assimilation produces a file that we name a final source file. It contains data and also instructions to PanLem for the importation of the data.

The instructions and data are located on distinct lines of the file. Each line contains only one datum or one instructional item.

Example

We showed you an example of a tabular file created from a Spanish–Zapotec dictionary source.

Serialization converts a file like that to a final source file, which looks like this:

:
0

mn
  dn
    spa-000
    astutamente
    dcs2
      art-303
      PartOfSpeechProperty
      art-303
      Adverbial
  dn
    zpq-000
    maños

mn
  dn
    spa-000
    astuto
    dcs2
      art-303
      PartOfSpeechProperty
      art-303
      Adjectival
  dn
    zpq-000
    maños

mn
  dn
    spa-000
    asustar
    dcs2
      art-303
      PartOfSpeechProperty
      art-303
      TransitiveVerb
  dn
    zpq-000
    chšeb

mn
  dn
    spa-000
    asustar
    dcs2
      art-303
      PartOfSpeechProperty
      art-303
      Verbal
    dcs2
      art-303
      MorphosyntacticProperty
      art-302
      REFL
  dn
    zpq-000
    chžeb

mn
  df
    spa-000
    ataque (epiléptico)
  dn
    spa-000
    ataque
    dcs2
      art-303
      PartOfSpeechProperty
      art-303
      CommonNoun
    dcs2
      art-303
      GenderProperty
      art-303
      MasculineGender
  dn
    zpq-000
    šon

mn
  dn
    spa-000
    atar
    dcs2
      art-303
      PartOfSpeechProperty
      art-303
      TransitiveVerb
  dn
    zpq-000
    chc̱hej
  dn
    zpq-000
    chda’ yag

mn
  df
    spa-000
    (estar) atado
  dn
    spa-000
    etado
  dn
    zpq-000
    chc̱hej
  dn
    zpq-000
    chda’ yag

mn
  dn
    spa-000
    atarantado
    dcs2
      art-303
      PartOfSpeechProperty
      art-303
      Adjectival
  dn
    zpq-000
    tarantadw

mn
  dn
    spa-000
    atarantarse
  dn
    zpq-000
    chec̱hol chenite

mn
  dn
    spa-000
    atardecer
    dcs2
      art-303
      PartOfSpeechProperty
      art-303
      IntransitiveVerb
  dn
    zpq-000
    chex̱jw gwbiž
  dn
    zpq-000
    chex̱jwža

mn
  dn
    spa-000
    atascarse
  dn
    zpq-000
    chaga’

mn
  df
    spa-000
    atascarse (sin poder orinar o defecar)
  dn
    spa-000
    atascarse
  dn
    zpq-000
    cheyjw

mn
  dn
    spa-000
    ataúd
    dcs2
      art-303
      PartOfSpeechProperty
      art-303
      CommonNoun
    dcs2
      art-303
      GenderProperty
      art-303
      MasculineGender
  dn
    zpq-000
    yi’iṉ

mn
  df
    spa-000
    atender (tomar en serio)
  dn
    spa-000
    atender
    dcs2
      art-303
      PartOfSpeechProperty
      art-303
      TransitiveVerb
  dn
    zpq-000
    chonen c̱he
  dn
    zpq-000
    chzi’ c̱he‣chzi’ diža’
  dn
    zpq-000
    chejḻe’

mn
  dn
    spa-000
    atrás
    dcs2
      art-303
      PartOfSpeechProperty
      art-303
      Adverbial
  dn
    zpq-000
    trasle

mn
  dn
    spa-000
    atrasado
    dcs2
      art-303
      PartOfSpeechProperty
      art-303
      Adjectival
  dn
    zpq-000
    trasadw

mn
  dn
    spa-000
    atravesar
    dcs2
      art-303
      PartOfSpeechProperty
      art-303
      TransitiveVerb
  dn
    zpq-000
    chḻaga’
  dn
    zpq-000
    chde

mn
  dn
    spa-000
    atreverse
    dcs2
      art-303
      PartOfSpeechProperty
      art-303
      Verbal
    dcs2
      art-303
      MorphosyntacticProperty
      art-302
      REFL
  dn
    zpq-000
    cheyaxje

mn
  dn
    spa-000
    atrevido
    dcs2
      art-303
      PartOfSpeechProperty
      art-303
      Adjectival
  dn
    zpq-000
    chogwlaz

If you compare them, you can see that the same information appears in both, except that it is more specific in the final source file. For example, the final source file makes explicit that “astutamente” is an expression and is in Spanish. A line in the tabular file is converted, typically, into a set of lines in the final source file.

Syntax

A final source file must comply with a syntax that PanLem can parse.

You can think of a final source file as containing a set of specifications for meanings (“mn”). Within each meaning specification, there are specifications for one or more meaning details. Meaning classifications, meaning properties, definitions, and denotations are all meaning details. In turn, denotations have their own denotation details. These are denotation classifications and denotation properties.

Each specification of a detail contains 3 or more lines. The first line specifies the detail type. The lines are the following:

  • mcs1 (unary meaning classification): 1 expression specification, consisting of 1 line containing an expression’s language variety’s UID and 1 line containing the expression’s text
  • mcs2 (binary meaning classification): 2 expression specifications, each consisting of 1 line containing an expression’s language variety’s UID and 1 line containing the expression’s text
  • mpp (meaning property): 1 expression specification (as in mcs1 and mcs2) and 1 line containing a text.
  • df (definition): 1 line containing the UID of the language variety of the definition, and 1 line containing the definition’s text
  • dn (denotation): 1 expression specification (as in mcs1 and mcs2)
  • dcs1 (unary denotation classification): same as mcs1
  • dcs2 (binary denotation classification): same as mcs2
  • dpp (denotation property): same as mpp

The example file shown above contains blank lines. Those are permitted but not required. If you want them, you may insert them anywhere in the file except within a detail.

Leading and trailing whitespace is stripped on all lines, so it is possible to indent lines to make the logical structure more clear, as in the example above.

Final source files are text files with UTF-8 encoding. The lines all end with the line-feed (LF) character (a Unix or OS X line break, U+000A).

You should configure your source-analysis environment so that your system writes final source files with LF line breaks. If you use the PanLex tools, this should be done for you automatically on all platforms.