Importing data into the database

IntroductionUp

The importation process involves checking your serialized file, which we call the final source file, for errors; submitting it (once any errors have been corrected) with PanLem for inclusion in the PanLex database; and depositing the revised source directory into the PanLex resource archive.

Checking the final source file

  1. Browse to the PanLem interface.
  2. Choose your interface language (we assume below that it’s English).
  3. Select begin.
  4. Select the edit button.
  5. Enter your username and password.
  6. Select the file — send button.
  7. Locate the source and click its button.
  8. Select the button that says choose file or similar and open your final source file.
  9. Select check, await the email status message, and verify that the it reports that the file is good. If not, it will tell you which line the first error appears on and what kind of error it is. If the source file does not pass the checking stage, you will need to go back (at least to the serialization stage) and fix any errors. If your file checks OK, you can continue by submitting it for importation.

Checks can also be performed from the command line with plx submit check.

Submitting the final source file

  1. Select back, and choose the file again.
  2. Select approve — more if you would this file to add to (but not replace) data for this source that are already in PanLex; select delete — replace — whole if you would like to replace all of the source’s existing data. If your submission is successful, you should see a confirmation message similar to step 9.

Submissions can also be performed from the command line with plx submit.

Depositing the revised source directory

The directory that you retrieved from the resource archive has now been enlarged. It now contains the final source file and usually some intermediate files and scripts that you produced those files with.

Your next action is to compress that revised directory with the zip format and upload the compressed file to the resource archive. Steps:

  1. Check that all files in the directory are named in accord with the PanLex file-naming conventions.
  2. Check that the directory has the conventional PanLex directory structure.
  3. Compress the directory into a zip file. Mac users can do this by Control-clicking on the directory and choosing Compress. Windows users can do this by right-clicking on the directory, navigating to Send To, and choosing Compressed (zipped) folder.
  4. Upload the zip file to the resource archive, following these instructions.

Deposits can also be performed from the command line with plx upload.

Following review, the PanLex supervisory staff will move the uploaded directory in incoming to the main directory.

What if you made a mistake?

Suppose you discover that you submitted a seriously erroneous final source file, but it was syntactically valid, and so it was imported into the database. What can you do to correct the errors?

The solution that is usually appropriate is to correct the errors in your scripts, generate a new final source file, and submit that, replacing the original data. When you do this, however, if you use the normalize serialization script, you should add a parameter instructing the API to ignore the data of the same source. Otherwise, the data you just submitted would inflate the scores of the unchanged expressions in your new submission.

But occasionally you may find that it will take several days to correct the errors or you need help from others, and you want to take the erroneous data out of the database in the meantime. For this purpose you can use PanLem to edit the source and choose the option to delete all of its translations. PanLem will show you the counts of the data to be deleted and ask you to confirm that you really want to do this.

A submission mistake can, under some conditions, be more difficult to undo. You may then need to ask a member of the project staff for help, which may require performing a special rewind operation or even restoring an archive of the database saved at an earlier time. Such a situation can arise if you have:

  • Deleted good data and replaced them with new, erroneous data.
  • Submitted a good final source file but specified the wrong source, thereby deleting that other source’s existing data.

Can the checking be skipped?

If you submit a final source file for importation, PanLem checks the file before actually importing it. If it finds any error, it sends you a message describing the error and does not begin the importation. Therefore, it is safe to submit a file for importation without first submitting it for validation. However, checking alone is faster, because it does not require PanLem to store the importation instructions as it checks.

Should you throw away your local copy?

After you submit the final source file and get confirmation that the submission succeeded, and after you deposit the revised directory to the resource archive and get confirmation that it arrived, is it safe to delete your local copy of the directory?

In principle, it should be safe, but mistakes happen. A staff member who is checking your directory might misplace or accidentally delete it, or you might forget to check that your submission and deposit’s successes were both confirmed, for example. So it is safest to keep your own copy of the directory for at least a few weeks, in case it may be needed for recovery.