Team meetings

IntroductionUp

The PanLex staff that participates in source consultation meets periodically to discuss progress and issues. The issues are listed below, beginning with the latest scheduled meeting.

3 May 02017

  • Happy hour scheduling
  • New website plans
  • Problems and solutions

25 April 02017

  • API updates
  • Database internals updates
  • New website plans
  • IP questions
  • Problems, solutions, and announcements

18 April 02017

  • Server updates
  • Translation query improvements
  • Gary’s translation count page
  • Language variety groups
  • Problems and solutions

13 April 02017

  • PanLem improvements
  • New API for language variety and expression suggestions, fallback, etc.?
  • API rewrite
  • Language variety groups
  • Server upgrade
  • Problems and solutions

4 April 02017

  • PanLem improvements
  • CLDR
  • Aramaic cleanup
  • Parentheses parser
  • Problems and solutions

28 March 02017

  • Steering Committee report and meeting
  • Office space update
  • IP/licensing discussion
  • PanLem upcoming changes
  • Problems and solutions

21 March 02017

  • Steering Committee report and meeting
  • Office space update
  • Langvar table changes
  • PanLem changes
  • Database collation order (C vs. C.UTF-8)
  • Problems and solutions

10 March 02017

21 February 02017

  • Upcoming Kumu meeting
  • Problems and solutions

14 February 02017

  • Database object renaming
    • API updates
    • Other consequences
  • Upcoming Kumu meetings
  • Problems and solutions

7 February 02017

3 February 02017

17 January 02017

  • License category for personal sources
  • Dave’s trip
  • Preparing for Manuel Maqueda meetings
  • Problems and solutions

11 January 02017

  • csppmap enhancements: degrade parameter
  • Ethnologue subscription
  • Call tomorrow with Manuel Maqueda
  • Problems and solutions

4 January 02017

  • API enhancements: after parameter
  • csppmap enhancements: degrade parameter
  • Volunteer queries about summer internship
  • Remote source consultation
  • Problems and solutions

30 December 02016

  • lookup_lang_by_name.pl script
  • Proposed API enhancements
    • Distance-2 translations: restrict intermediate expressions by variety, source, lv mutability (?), other things (?)
    • Proper cursor/paging support
  • Problems and solutions

21 December 02016

  • Steering Committee meeting December 19
    • Interim project director (until March 31)
    • Proposal on vision, mission, and business plan
  • Transition
    • Jonathan’s role
    • Staff roles
    • Meetings with staff
  • Orphan expressions in immutable language varieties
  • Problems and solutions

13 December 02016

  • Documentation of PanLex research
  • Volunteer program
  • Vision and mission
  • Steering Committee meetings
  • Problems and solutions

7 December 02016

  • Volunteer program
  • Translation of ISO 639 language codes
  • Visualization of PanLex graph (3 examples)
  • Steering Committee meeting
    • Inputs
    • Results
  • Problems and solutions

29 November 02016

  • Volunteer program
  • Mission and strategy
  • Problems and solutions

22 November 02016

  • Volunteer program
    • Summary of all active volunteers in each track.
    • Best staff approach to consolidating supports and work sessions.
    • Future approaches to vols: training dates, min time commitment (e.g. 3 hrs/wk x 4 mo plus 2x/mo progress check-in) to be defined in advance of recruitment. Some “grab-and-go” vol jobs identified for on-going singleton sign-ups.
  • Mission and strategy
    • Survey responses
    • Formulations
    • Consultations with Steering Committee
  • Problems and solutions

15 November 02016

  • Productivity
    • Experimental measures
      • expprod
        • plx=# select ex.lv, lv.lc, lv.vc, sum(net) as dnnet from (select item, net from util.logsnets('2016-10-28', '2016-11-12', 'dn', 3)) as tbl, ex, lv where ex.ex = tbl.item and lv.lv = ex.lv group by ex.lv, lv.lc, lv.vc order by dnnet desc;
            lv   | lc  | vc  | dnnet  
          -------+-----+-----+--------
             524 | xno |   0 | 382354
             187 | eng |   0 | 193218
            6899 | hak |   2 | 120840
            1835 | cmn |   3 | 119630
           10470 | yue |   5 | 112173
            6712 | art | 254 |  65094
            1628 | cmn |   1 |  63460
             820 | yue |   0 |  62208
             263 | hak |   0 |  43269
            1627 | cmn |   0 |  41893
           10136 | yue |   4 |  41080
           10140 | hak |   6 |  28412
             131 | cor |   0 |  17188
           11063 | oco |   0 |  15699
           11188 | cnx |   0 |  14856
             298 | ind |   0 |  10195
             431 | mic |   0 |   6226
    • Alternative measures
      • lvgrowth
      • Gini coefficient
        • Today: 0.966
          • with lvexct as (select lv, count(ex) as exct from ex group by lv) select sum(abs(t1.exct - t0.exct)) / (2 * (select count(lv) as lvct from lvexct) * (select sum(exct) as excts from lvexct)) as gini from lvexct as t0, lvexct as t1;
        • If we added 100 expressions to every lv: 0.925
        • If we added 2,000 expressions to every lv: 0.51
        • If we added 6,000 expressions to every lv: 0.26
    • Improvement strategy
  • Quality control
    • On ingestion
      • Backlog size: 86
    • Within database
  • Acquisition strategies
  • Steering Committee
    • Meeting on 14 November
    • Meeting on 7 December
  • Strategic planning
    • Questionnaires for stakeholders
      • PanLex staff
      • Long Now staff
      • Steering Committee
      • Advisory Committee
      • Volunteers and former interns
    • Needs analysis
      • SWOT (strengths, weaknesses, opportunities, threats)
      • Impact/control of opportunities and threats
    • Mission statement
    • Promotion: targets, methods, universality/coverage
    • Developing a plan
      1. Determine which staff members want to participate
      2. Collect information via questionnaires, interviews, brainstorming sessions, consultation with Steering Committee and Long Now, etc.
      3. Formulate a written plan, including mission statement, promotion strategies, and job tasks that will need to be done. Include budget and role of partnership with Long Now.
      4. Propose how job tasks can best be delegated: to current personnel (making use of available skills and interest), contractors, etc. Include organizational structure.
    • Immediate next steps and deadlines
  • Problems and solutions

8 November 02016

  • Skype discussion with Computational Linguistics Club
  • Interfaces
    • Dynamic statistics of database on website
  • Class-heterogeneous translations
  • Production
  • Problems and solutions

1 November 02016

  • Volunteer training (only whole-team aspects)
    • Schedule
    • Staffing
    • Communication
    • Local volunteers
    • Remote volunteers
  • Denotation estimation
  • Productivity
    • Data
    • Criteria
    • Periodization
  • Problems and solutions

25 October 02016

  • Volunteer training
    • Notification and instructions to trainees
  • Productivity
  • Problems and solutions

18 October 02016

  • Volunteer training
    • Remote training platform
    • Track 1
    • Track 2
    • Track 3
    • Track 4
      • Communication channel(s)
      • Subproject selection and team formation
    • Track 5
    • Dyen lists
  • Acquisition
    • Language-variety identification
      • Name–UID translations
      • Registration of source-specified language-variety names
        • Automatic inference of their language varieties
    • Boilerplate letter for requesting data from resource holders
  • Extension of classifications and properties to sources
  • Productivity
  • Problems and solutions

13 October 02016

  • Intern relations
    • PanLex Skype meeting with intern (JS) during Computational Linguistics Club meeting
  • Volunteer training
    • Track-specific training plans
    • Remote-volunteer introductory training plans (26 October)
  • Classification and property extension
  • Number list (art-269) as separate source
  • Productivity
  • Office security
  • Problems and solutions

7 October 02016 (acquisition)

  • Source difficulty estimation

6 October 02016 (assimilation)

28 September 02016

  • Volunteer program planning
    • Minimum expected skills, intensities, and durations
    • Actual skills of participants
    • Training schedule, methods, content, and staffing
  • Long Now Member Summit
    • Table
  • Source size estimation
    • Strategy
    • Development and testing
  • Source difficulty estimation
  •  Productivity
    • Revised report
    • Responses from staff
  • Problems and solutions

20 September 02016

  • Volunteer program planning
    • Minimum expected skills, intensities, and durations
    • Actual skills of participants
    • Training schedule, methods, content, and staffing
  • Long Now Member Summit
    • Unconference (no projection)
    • Table
    • Attendance
  •  Productivity
    • What are we measuring, how are we measuring it, and why?
    • Potential effect of unknown/uncommunicated metrics
    • Collection of additional data
      • Wrike task timelogs vs. PanLem vs. measuring production less granularly (e.g., by week/month/quarter)
    • Further analysis
      • Comparability of already interpreted/analyzed sources and how to project future costs by style on that basis
      • Future influx rate of easily analyzable sources (assuming ongoing acquisition)
      • Percentage of sources that can reasonably/cost-effectively be assimilated in either style
    • Actions
      • Auto-generate denotation count estimates for some sources
  • Org chart: how should decisions be implemented and communicated?
  • Problems and solutions

13 September 02016

  • Volunteer program planning
    • Training rehearsal dates
    • Training dates and venues
    • Pre-training prep & reading for vols, track 2/3 software to install?
    • Mentorship
    • Commitments from volunteers
  • Incomplete intern work: progress
  •  Concepticons
    • Possible conversion of eng:Miller art-301:Identifier meaning properties to language variety “Expanded PWN3 synset_offset”, making p. 36 of Costa (02016) a source
  • Productivity
    • Staff commenting and participation in management discussion
  • Problems and solutions

7 September 02016

  • Internship program conclusions
  • Intern evaluations
    • Retention of copies of transmitted evaluations
  • In-progress intern work
    • Acquisition and assimilation submissions
    • Sources claimed for assimilation
    • Interface and research projects
      • Files
      • Continuation
  • Completion of incomplete source registrations, such as:
    • bdb-ind:KBB (no language varieties)
    • dws-eng:Dutton (no file formats)
  • Volunteer planning
    • Training
      • Possible venues
      • Possible dates and times
    • Training invitations
      • Recipients so far
      • Possible other recipients
    • Training content and duration
    • Training staffing
  • Productivity metrics
    • Acquisition
    • Assimilation: Consider sources M (multilingual) and B (bilingual). M assigns each of 10 meanings to 5,000 expressions, 1 in each of 5,000 language varieties. B assigns each of 25,000 meanings to 2 expressions, 1 in each of 2 language varieties. M and B each contain 50,000 denotations, but M contains almost 500 times as many translation pairs as B. How should PanLex value M and B?
  • Workflow, productivity, and satisfaction
  • Problems and solutions

30 August 02016

  • Internship program
    • Reference requests
    • Other aspects: debriefing meeting to be scheduled
  • Volunteer program
    • Communications and planning
    • Training
    • Intern mentors
  • Productivity metrics
    • Discount rates
      3 possible discounts
    • Individual and aggregate data on expressions and translations (shared by Pool under Assimilation/Production)
    • To be computed: acquisitions
    • Unmeasurables?
  • Problems and solutions

25 August 02016

  • Internship program
    • General results
    • Reference requests
  • Volunteer program
    • Roster
    • Space
    • Communications and planning
    • Training
  • Task/workflow management after the internship program
  • Productivity metrics
  • Problems and solutions

14 June 02016

  • Internship program
    • Pre-start instructions to interns
      • What to bring
      • Special instructions for late arrivers
    • Preparations
      • Welcome party
      • Web feed
      • On-site equipment
      • First day
      • First week
    • Staffing assignments
    • Intern office visits
  • Task/workflow management
    • Acquisition
      • Source template to be reviewed (including obligatoriness)
      • New prioritization rule for language selection
    • Assimilation
  • Source classifications and properties
    • Representation in the database
      • Cf. language varieties
    • I/O
  •  Documentation
    • Retirement of duplicate/obsolete pages
      • Mark in page name
    • Repair of broken links
    • Revisions
    • Permissions to edit pages
  • Problems and solutions

7 June 02016

  • Internship program
    • Space
    • Planning
    • Task management
  • Volunteer planning
  • Documentation
    • Revisions
    • Terminology
    • Menus
  • Denotation-count estimation
  • Problems and solutions

31 May 02016

  • Internships
    • Space
    • Planning
      • Calendar implementation
  • Volunteer planning
  • Source consultation
    • Source sizes in selection interface
  • Mail management for panlex.org addresses
  • Office software
  • Problems and solutions

24 May 02016

  • Internship space
  • Internship planning
    • Attendance
    • Documentation
    • Curriculum
  • Economics of source consultation
  • Third Wrike update
    • workflow design
    • language names (short and long lists)
  • Problems and solutions

19 May 02016

  • Internship planning
    • staffing
      • track 1: JP, JA, SC
      • track 2: JP, DK, SC, source analysts
      • track 3: DK, source analysts, (JP)
      • track 4: JP, DK, SC
      • track 5: JP, DK, SC
    • schedule
      • daily schedule: 10am-4pm core hours; figure out remaining hours to get to full time
      • all interns present first two weeks, more flexibility thereafter
      • track 1: weekly recap to show progress
      • track 2: make sure it doesn’t get too monotonous
      • tracks 4 and 5: present preliminary results to everyone about week 7, get feedback
    • mentoring
      • should we assign each intern to an individual mentor?
      • what will mentoring mean?
      • how will the assignments happen?
    • training
      • first day
        • give full overview similar to SF Globalization talk
        • summarize each track
      • curricula development
        • track 1: JA
        • tracks 2 and 3: DK, source analysts
        • tracks 4 and 5: ?
  • Problems and solutions

11 May 02016

  • Internship statistics
  • Internship planning
    • space options and optimal space use
    • task management system: Wrike
    • training schedule: send comments/revisions by 5/17
  • Economics of source consultation
  • Pilot project: estimate number of expressions in 50 lower-difficulty sources, then prioritize those with the highest payoff
  • Documentation revision
  • Problems and solutions

4 May 02016

  • Internship statistics
  • Internship planning (space, schedule, etc.)
  • Translationese such as “get burned”, “get dizzy”: to normalize or not? Consensus: treat as expressions, plus inchoative meaning classification.
  • Heterogeneous “name of X”: just leave as definitions or try to extract more?
  • mul:Imboden: how much taxonomic information to include?
  • Source consultation productivity
  • Problems and solutions

26 April 02016

  • Internship statistics
  • Internship planning schedule
  • Source consultation productivity
  • -er (one who) as meaning classification superclass expression
  • Problems and solutions

20 April 02016

  • Slack configuration
  • Sick days
    • California SDI: administered by EDD
  • Internship statistics
  • Internship space
  • Internship planning
  • Economics of source consultation
  • Problems and solutions

12 April 02016

  • Internship applications
  • Internship planning
  • Volunteer applications
  • Volunteer planning
  • Language-expert panel
  • Problems and solutions

6 April 02016

  • Denotation quality estimates
  • Problems and solutions

30 March 02016

  • Internship applications
  • Multiple translations
    • Tolerances for synonymy
    • normalize with delim
  • Emojis as a language variety
  • IETF Language Codes (BCP-47) as a language variety
  • cmn-002 retirement
  • Problems and solutions

22 March 02016

  • Internship applications
  • Track 4, apps, and bots
  • Orthographies and language varieties
  • Tsimshian verbs: singular and plural
  • Problems and solutions

15 March 02016

  • Server availability
  • Internship applications
  • Downloading data from script-based websites
  • MS Office licenses
  • Problems and solutions

9 March 02016

  • Large Graph Layout (e.g.,  Walrus)
  • PanLex article in March Long Now Quarterly News
  • Server health
  • Intern application processing
  • Problems and solutions

1 March 02016

  • UCB I School career fair tomorrow
  • Internship and volunteer application processing
  • Server repairs and possible upgrade
  • Occasional other-language-variety expressions
  • Problems and solutions

25 February 02016

  • Capstone projects for CSE students
    • PanLex exploration interface
    • Database browser
    • Mobile interface
    • Language picker
    • Translation inference
    • Reimplementation of PanImages
  • Bird species names
  • Internships
    • Recruiting
    • Evaluating
  • Problems and solutions

16 February 02016

  • Intern recruiting at UCB (18 February and 2 March career fairs)
  • Intern recruiting generally
  • Long Now lunch
  • Reddit thread on PanLex
  • PanLex and AMA (Ask Me Anything)
  • Archaic expressions: classify or revarietize?
  • Normalization tuning for vie-000
  • Problems and solutions

9 February 02016

  • Intern recruiting at UCB (18 February and 2 March career fairs)
  • Intern recruiting generally
  • Inchoatives, causatives, and statives
  • “Indefinite article” and similar translations: expressions or definitions?
  • Problems and solutions

2 February 02016

  • Intern recruiting
  • Revised function (sff0ad) processing final source files
  • Hebrew varieties
  • Source-file hierarchy
  • Quality investment criteria
  • Problems and solutions

29 January 02016

  • Unicode Technical Committee meeting
  • Using CLDR to record valid language variety characters and use them in normalization
  • Modified PanLem home page
  • Compositionality
  • Preliminary source storage
  • Internship recruitment
  • Working from home policy/suggestions
  • Problems and solutions

19 January 02016

  • Volunteers
  • Possible internships
  • Problems and solutions

13 January 02016

  • Left-to-right and Right-to-left control characters: prohibited or not?
  • Size limits on contents of text-type cells (PanLem editing limits superclass and attribute expression texts to 100, other expression texts to 200, definition texts to 200, and property values to 100 characters)
  • Unicode problems?
  • Problems and solutions

5 January 02016

  • Language varieties with infinitely many expressions (e.g., art-269)
  • Problems and solutions

29 December 02015

  • cspp/doc.txt file in panlex-tools
  • extag update: ‘tagged’ parameter removed and ‘ex’ tag not preposed to leading ‘ex’, ‘df’, ‘mcs’, or ‘mpp’ tag
  • Source-analysis quality-control routine
  • Problems and solutions
  • Language-editor status management

22 December 02015

  • Redundancy of sources: bidirectional inversion
  • Content curation
  • “spp.” in lat-003 expressions
  • PanLem Unicode compatibility implementation changes
  • Degradation of ‘ñ’ in spa-000 (versus other language varieties)
  • Punctuation in source labels
  • Long Now newsletter articles on PanLex
  • Problems and solutions

17 December 02015

  • Crowdsourcing source acquisition (see CrowdFlower)
  • Updating PanLem to use Unicode 8.0.0
  • Untranslatable words
  • Reliably elucidating language varieties
  • Differentiating same-script language varieties (e.g., pointed and unpointed Hebrew)
  • Identifying lat-003: taxonfinder
  • How deep to dig into source data: etymologies, usage examples, etc.
  • Coping with lexical asymmetry: “(want to) eat” etc.
  • Problems and solutions

9 December 02015

  • Plans for source-acquisition volunteer orientation
  • Change in PanLem user classification
  • Ingestion of huge sources
  • Creation of dialect varieties (Alex’s new Japanese source)
  • Problems and solutions

3 December 02015

  • Indentation of lines in final source files
  • Source-analyst statistics: central tendency of language-variety size
    Count Sqrt ln
    A 400000 632.5 12.9
    B 2000 44.7 7.6
    Mean 201000 338.6 10.3
    A 100000 316.2 11.5
    B 100000 316.2 11.5
    Mean 100000 316.2 11.5
  • Problems and solutions

24 November 02015

  • Problems and solutions
  • Unsupported recently added Unicode codepoints
  • Treatment of nonlemmatic translations (e.g., discriminatory = 差別の)
  • Source classifications and properties
  • Regularization of reingestion of data from foreseeably revised sources
  • Estimates of expression counts for 4K unconsulted sources for use in source acquisition
  • Recording outcomes of searches for sources on undocumented languages
  • In IMUG 2015.02.19 talk on technology for endangered languages, Craig Cornelius (Google) said we should not expect for more than 100 languages: (1) language detection (he referenced Compact Language Detector 2, which covers 83 languages), (2) spelling correction, etc.

17 November 02015

  • Distribution of submission and upload reports
  • One-on-one meetings
  • Problem reports
  • Documentation and other support for consistent selection of classification and property expressions
  • New error checking in out-full-0
  • Source-wide art-300:HasContext classifications
  • Code check-in (on GitHub)
  • Analytic license (limits on inference of unexpressed data)

10 November 02015

  • Activity reports
  • Text degradation changes
  • Fake word generation (cf. Duolingo)
  • Language-variety inference in source analysis
  • External tools: see tweet on tool for testing morphological tools
  • Meaning versus denotation classifications (e.g., place names)
  • Choice of superclass expressions (e.g., art-303:LivingVariety)
  • Is-has distinction in classifications
  • Classification normalization and mapping enrichment

4 November 02015

  • Activity reports
  • SF Globalization presentation comments
  • LLOD lexicon datasets, thesaurus datasets, and terminology datasets: acquisition strategy
  • Appropriate treatments of type-of information
  • Apostrophes
  • Ellipses
  • Choosing lemmas (e.g., inalienably possessed nouns)
  • Geographic properties of languages and varieties
  • Japanese normalization

27 October 02015

  • Activity reports
  • Effective use of normalizedf
  • Manual/irregular sources
  • Tools (textblob, tabula)
  • Language-variety mapping
  • Rehearsal for SF Globalization presentation

21 October 02015

  • Activity reports
  • Lemmatization of polysynthetic languages (Alex’s Ojibwe example)
  • DBnary word class mapping
  • NestedParensToBrackets function in PanLex::Util
  • Valid PanLex expressions: checking at the levels of out-full-0, PanLem, and PostgreSQL
  • Prohibited codepoints in text values in PanLex tables
  • SF Globalization presentation preparation
  • Productivity statistics
  • Browser timeouts in PanLem

14 October 02015

  • Activity reports
  • Source analysis questions
    • Treatment of dialect expressions for major languages (e.g., Spanish) when most of a source is not marked for dialect
    • Meaning property attribute expression to use for unanalyzed portions of meanings
    • Editing language-variety descriptions via their meanings (to add to PanLem)
  • Preparation for 2 November presentation
  • Source consultation productivity
  • Source acquisition planning

6 October 02015

  • Activity reports
  • Related events
  • Tool improvements: exdftag, normalize, normalizedf
  • Spelling correction: aspell, hunspell
  • SF Globalization presentation by PanLex at Adobe (410 Townsend St., SF) on 2 November

29 September 02015

  • Activity reports
  • Related events
  • Backlog liquidation: (1) progress; (2) strategy
  • Difficult sources
  • PDF files
  • Makefile tool
  • Adding basic error-checking to out-full-0
  • Classifications and properties: (1) review of tools; (2) normalization
  • Part-of-speech tagging: (1) internal tools; (2) external tools; (3) whether to do it; (4) when to do it
  • Language varieties: treatment options
  • Possible additions to serialize/data/mcsmap.txt
  • Little-known internal tools
  • Recoding of pre-Unicode text

22 September 02015

  • PDF files
  • Spelling-correction serialization tool
  • Normalization during ingestion and later (example: “surnager, flotter, s’amuser dans l’eau”, “swim, float, play in the water”)
  • Backlog liquidation strategy

16 September 02015

  • Dialect tags on expressions in sources.
  • Parenthesized miscellany attached to expression candidates.
  • External spelling-correction tools in normalization.
  • Finding concepticon expressions for superclasses, classes, and attributes.
  • art-303:Class versus art-253:declension(icl>inflection>thing) for declensions to which expressions belong, and likewise with conjugations.
  • Does art-303:IntransitiveVerb imply art-303:Verbal?
  • Registers etc. (humble, polite, formal (vs familiar, vs informal), colloquial, archaic, slang, vulgar)
  • Preservation of original entries as meaning properties.
  • Language variety issues (identification, what names signify, etc.)