There are two aspirational PanLex text standards.
Language-variety-specific standard scripts
The first standard requires that text in any language variety be written in the standard script of that variety.
You may encounter sources that contain text written in nonstandard scripts. If you can find a tool that automatically converts that text to the equivalent text in the standard script, you should use it. Transliteration tools can perform some such conversions. They include:
- Unicode Transliteration Charts
- Unicode CLDR transforms
- Varamozhi (Malayalam)
- ITRANS (Hindi: Devanagari)
Error-free conversion to the standard script is often impossible, however, and in such cases we recognize distinct varieties of a language. If you are assimilating data from a source that documents what should be a script-specific language variety but the source’s registration declares the variety to be identical to one using a different script, you should correct the error in the registration, creating a new language variety in PanLex if necessary.
Most of the world’s languages are not normally written and thus have no standard scripts. For those, you should usually use whatever scripts your sources use.
Language-variety-specific standard orthographies
The second aspirational PanLex text standard requires that text in any language variety be written in the standard orthography of that variety.
Some tools perform such conversions. For example:
If a language has significantly distinct orthographic standards, PanLex recognizes each standard as a basis for a distinct variety of the language. Correct the source registration if it does not reflect this.