PanLex internships

Intern with PanLex in Berkeley!Up

For the record …

This page describes an internship program that is no longer open for applications. The page remains here for reference.

PanLex is making translations among all words in all human languages—and dialects—publicly available. Do a Google search for kvg ituake and you’ll see one slice of PanLex. It was designed at the University of Washington and is now, together with The Rosetta Project, based at The Long Now Foundation. We work at offices in Berkeley, California, a block from the UC campus.

We need help in pursuing our mission. Over a billion translated word pairs (30 billion if you count translations through another language) can now be retrieved from PanLex, but that’s only a start. We invite you to volunteer in this effort.

As a PanLex 02016 summer intern, you can contribute to our work. You will be supporting research on language and meaning, while helping equip thousands of languages for machine translation, information retrieval, and global communication.

What’s in it for you? Rare experience doing panlingual (more than merely multilingual) documentation and engineering. Training pertinent to language, software, internationalization, and documentation careers, both academic and industrial. Skills and tools shared by our team of experts. With their guidance, you’ll learn how to enrich a gargantuan open-source database of lexical translations. Your name will be on the data that you add. And, while you’re with us, The Long Now Foundation will invite you to attend its San Francisco Seminars. PanLex intern alumni have gone on to careers at Facebook, Google, Microsoft, Evernote, Trulia, Base CRM, Market News International, Park IP Translations, Quorate Technology, Smart Information Flow Technologies, Stanford University, and other organizations. We are not offering stipends, salaries, or financial support for travel or housing, but will cooperate if you seek financial aid or academic credit for this internship.

PanLex team member at work staff-dicts View from Berkeley longnowsem

  • When: June 20 to August 12 (8 weeks, 3/4 to full time)
  • Where: Berkeley (if spending this time in Berkeley is not practical for you, consider non-intern volunteering with PanLex instead)
  • How: Training and supervised individual and team practice, per the NACE definition of an internship
  • What: You’ll focus on a track (or possibly more than one track), selected from:
    1. Source acquisition: discovering, obtaining, and cataloguing copies of printed and digital lexical translation resources (e.g., bilingual dictionaries) from anywhere in the world, in partnership with the Internet Archive. Relevant skills: multilingual familiarity, Internet search, crowdsourcing, library science, task management.
    2. Source consultation: interpreting complex entries from bilingual and multilingual dictionaries, selecting appropriate data, and entering standardized lexical translations into a file or app. Relevant skills: multilingual familiarity, Unicode scripts, input methods, lexicography, morphology, semantics.
    3. Source analysis: using, adapting, and developing computational tools to extract lexical data from files and insert bulk data into the PanLex database. Relevant skills: same as for source consultation, plus programming in interpreted languages (Perl, Python, JavaScript, etc.), regex design, character encoding, HTML/XML parsing, PDF-to-text conversion, OCR training.
    4. Interface development: Designing and implementing apps to let people and programs interact with PanLex data. Relevant skills: UI design, user testing, localization, API design, SEO, mobile OSs, Web development, Web–database integration, SQL, PostgreSQL, SQLite, database tuning.
    5. Database research: Investigating PanLex sources and data statistically and structurally for quality control and knowledge extraction and discovery. Relevant skills: probability, statistics, pattern recognition, machine learning, graph theory, linguistic typology, SQL, PostgreSQL.
  • Who: Fellow interns will be a mix of graduates and undergraduates, mostly in linguistics, computer science, or information science. But no field of study disqualifies you. You’ll be particularly suited if you’re any of these:
    • Polyglot
    • Linguist
    • Internet search addict
    • Bibliographer and cataloguer par excellence
    • Techie for whom /^[$£]\d+(?:,\d{3})?(?:\.\d{2})?$/ isn’t an obscenity
    • Design and usability whiz

Apply now for a PanLex internship