Resource archive | PanLex development

IntroductionUp

The PanLex resource archive contains the directories of the files of all procured resources (approximately 5,000 in number). The resource archive is located at /resources/.

Permissions

The resource archive contains complete copies of most procured resources. Many of these are partly or wholly protected by copyright. To publish copies of these copyright-protected works would not conform to our understanding of our fair-use rights. We have not (yet) segregated the copyright-protected resources from open-source and public-domain resources or applied distinct permissions to the directories of resources depending on each resource’s copyright or licensing status. Therefore, we simply restrict access to the resource archive as a whole. If you are a PanLex developer who needs access to the resource archive for your work within the PanLex project, you will receive credentials permitting you to access the resource archive. Your access is conditioned on your not redistributing copies of copyright-protected works in the resource archive to members of the general public.

Structure

The high-level structure of the resource archive consists of two principal directories:

main contains all resource directories not in transit.
incoming contains resource directories that have been recently deposited into the resource archive. They remain in incoming until reviewed and approved. Then they are moved to main.

Within each of these are directories, one per resource, with their own structure and naming rules.

Searching

You can search the resource archive for resources, and for particular files of resources, by file name. By default, the search covers the whole resource archive (main and incoming) and is case-sensitive. You can specify a specific directory to search, and the “ignore case” option turns off case sensitivity. Query strings can contain full or partial file and directory names.

Deposit

You may need to deposit directories into the resource archive if you work on acquisition, or if you work on assimilation. In the acquisition phase, you deposit procured resources. In the assimilation phase, you retrieve directories from the resource archive, work on their contents, create new files (such as scripts and versions), keep those new files in the directories, and finally deposit the analyzed source directories back into the archive.

The depositing process, if you perform it with a web browser, includes these steps:

Compress the directory into a zip-format file.
Visit the resource archive with a web browser.
Click “Upload zip file”.
Choose the zip file as the file to be uploaded.
Fill in your name and email address as the “depositor”.
Indicate whether the directory contains an unanalyzed resource or analyzed source(s).
Enter some information in the “Note” field, as follows:
- If you have procured the resource, registered its sources, and organized its files into the directory, enter “new acquisition” into the “Note” field.
- If you have retrieved the directory from the resource archive and made changes in it:
  - If you have modified and/or added files, enter “replacement” into the “Note” field.
  - If your changes include changing the directory name, enter “replacement for …” into the note file, giving the old directory name.
  - If you have also assimilated data from it and imported a final source file into the database, enter “assimilated” into the “Note” field. The reason for this note is that a quality-control reviewer will inspect the final source file that you produced (and perhaps other files) and let you know of any issues that appear to require more work.
- You may also add in the “Note” field any other information you consider pertinent.

An alternative is to deposit the directory from the command line with plx upload.

The server decompresses each uploaded zip file into a directory, which it stores in the incoming directory for staff review.

Retrieval

You may need to retrieve a directory from the resource archive if you work in acquisition, in order to correct or supplement the files in it. Or you may need to retrieve one if you work in assimilation, because you are about to assimilate some of the data in it into the database.

You can retrieve a directory from the source archive as a zip file and then decompress it. If you are accessing the resource archive with a web browser, use the “download zip file” link that appears to the right of the directory name (if viewing main or incoming), or at the top of the page (if viewing an individual source directory). Alternatively, if you are issuing requests from the command line, you can use plx fetch.

If you are retrieving a resource for assimilation, you should check to ensure that its source record is correct and that you have the latest version of the resource. Looking at the source registration record in PanLem, visit its URL (if any), and compare the date and content of the resource there with the version stored in the PanLex resource archive. If the resource’s current version is more recent than ours, retrieve it and compare it with the stored version. If the more recent version is better (e.g., more complete or accurate), replace the stored version with the current one in your local directory. If you found the resource but its URL has changed, edit the registration to correct the URL.