Unicode | PanLex development

This page provides additional resources on Unicode and encoding issues you may encounter when editing data for PanLex.

First, read our guide to Unicode in PanLex, which describes why the standardization of encoding within the database is so crucial and provides some guiding principles on normalizing text.

To find the code point for a particular character, you can refer to the Unicode encoding table or try a Unicode character search with one of the tools made for this. These include codepoints.net and fileformat.info.

Alternatively, you may find this UTF-8 decoder helpful.

Questions, error reports, and other comments about the Unicode standard can be submitted to the Unicode Consortium.
————
More stuff from training site:

hex editor
- OS X: brew install hexedit
- Cygwin: run setup.exe and install hexedit
TECKit
- OS X installer
- Windows installer
  - to make available from Cygwin, cd to the TECKit directory, then cp txtconv.exe TECkit_x86.dll /usr/local/bin
- SIL Converters