The standard software that you need for assimilation depends on the strategy.
If you are doing interpretation, you need a web browser and a text editor, both of which must correctly display, and permit you to enter, arbitrary Unicode characters, including mixtures of left-to-right and right-to-left characters. For that purpose, you also need fonts that cover the range of Unicode characters.
If you are doing analysis, you need the same software, and more.
In any kind of assimilation, you sometimes need access to the PanLex database via its expert web interface, PanLem. Most current web browsers have the necessary features. If you find that a browser has trouble with PanLem, trying another browser is likely to resolve the problem.
You may be able to use your preferred text editor for assimilation, if it supports Unicode and bidirectional text.
Some editors that we have found mostly compliant with these requirements are (“$” = non-free):
- For OS X: TextEdit, Atom, TextWrangler, Sublime Text ($)
- For Windows: Notepad++, Atom, SciTE, jEdit, Sublime Text ($)
Both web browsers and text editors should support bidirectional text if they are used for assimilation. You can manage without such support if you are assimilating a source that has no text containing right-to-left characters, as it does if written in the Arabic, Hebrew, and several other scripts.
Bidirectional support is far from universal. Consider, for example, the following line from a source, displayed in 6 different text applications on an OS X host (1 = Safari, 2 = LibreOffice Writer, 3 = TextWrangler, 4 = TextEdit, 5 = Terminal, 6 = Bluefish). Of these, only TextEdit seems correct. The line begins with Arabic letters, so it should begin on the right. The Arabic letters should not appear separated by spaces. The braces should be balanced.
Complex script support
Web browsers and text editors should, for our purposes, also support complex scripts. Script complexity takes various forms, but common manifestations include:
- Letters that appear in a different order from their logical order.
- Letters with various shapes that depend on context.
- Diacritical marks that appear in different locations that depend on what letters they are attached to.
If your sources are written in complex scripts, you may find that some popular browsers and editors fail to support them properly.
PanLex data can contain most Unicode characters, but some computer operating systems, as delivered, do not display some characters because of the limited repertoires of the fonts installed on them. For assimilation, you may need to install additional fonts. The most useful of these are the Noto suite.
When you assimilate a source that has a font-based pre-Unicode encoding, it can be useful to find a copy of the (non-Unicode) font on which its encoding is based and install that font. That can help you see the characters as they should appear and determine how to map each codepoint in the source file to its proper Unicode codepoint, if there isn’t an encoding converter capable of doing that for you.
Analysis requires more than the minimal standard software. The additional software depends on the source and on you.
Analysts use programming languages to design rules and apply them to source data, mainly during tabularization.
You may find that a programming language that you want to use is installed as a standard component of your computer’s operating system. Even if so, the installed version of the language may be obsolete, and if so it may fail to support some Unicode characters that a later version supports. Updating the language may be necessary.
If you are an advanced editor and wish to get more information from the database than PanLem allows, you may want get staff authorization to interrogate the database using SQL queries. The PanLex database is maintained in PostgreSQL, and the most common PostgreSQL client is
- On OS X, the
psqlcommand-line client should already be installed.
- On Windows, run Cygwin’s setup.exe. When prompted for which packages to install, select postgresql-client from the Database category. Once the packages have finished downloading and installing, you can run
psqlfrom the Cygwin Terminal.
- You can also access the PanLex database with pgAdmin and DBVisualizer, graphical applications available for both OS X and Windows.
When querying the database, you may find it useful to consult summary reference documents, including: