Text standardization


Resources are published in various formats on various media. Digital resources with lexical data can range from photographs, video recordings, and sound recordings to images of handwritten or printed pages to digital text. Only digital text can be assimilated into the PanLex database. But, even when a source consists of files containing digital text, the files can have many alternative formats.

Text standardization is the stage of assimilation in which you convert the content into digital text, if it isn’t that already, and make it conform to PanLex standards.

The following pages cover different aspects of text standardization: