Concepticon research | PanLex development

ConcepticonsUp

Most of the language varieties that we document are varieties of natural languages.

Invented languages, and controlled varieties of natural languages, also exist, and we document those, too.

Among those, some are designed to be free of ambiguity and free of synonymy. At the lexical level, a source that defines one of those avoids ever assigning more than one meaning to an expression, and avoids ever assigning a meaning to more than one expression. A language variety of that type is a concepticon. A concepticon can be understood as the language of an ontology.

We protect concepticons from degradation by sources other than those that have brought them into existence. At present, we do this by giving each concepticon the attribute “immutable”. That allows sources to assign their own meanings to expressions in a concepticon, but does not allow any source to create, delete, or modify expressions in a concepticon. It may be more appropriate, instead, to operationalize the idea of ownership of a language variety by a source, something we have not yet done.

People have been inventing concepticons since at least the 17th century. Morris Swadesh, a 20th century linguist, was one of them, and some concepticons derived from his are called “Swadesh lists”. Two of those are the basis for the PanLex Swadesh corpora.

The expressions of concepticons are, in some cases, numbers, either integers or decimal numbers. In other cases, expressions are composed of letters, or of letters and punctuation.

Reconciliation

Some concepticons aim to represent basic concepts of world-wide importance. There is, however, no agreement on what those concepts are, nor on how semantic space should properly be partitioned. Thus, there are multiple concepticons with partly overlapping concepts. If we identify equivalences among meanings of distinct concepticons, we enrich the database.

Suppose source A translates between concepticon L and 5 natural language varieties, and source B translates between concepticon M and 18 natural language varieties. If concepticons L and M have some equivalent expressions, we can document that by translating them into each other. This establishes indirect translatability between the former 5 and the latter 18 natural language varieties. Those are distance-3 translations, but, if we have satisfied ourselves that the equivalences between concepticons are definitive, we can treat translations through L and M as if they were distance-2 translations

Our concepticon research consists of inspection of concepticons and their translations and determinations of equivalence. When we discover an equivalence, we document it in the database.

PanLex Union Concepticon

The device that we use for documenting inter-concepticon equivalences is the PanLex Union Concepticon (PUC). This is a concepticon that we have created for this purpose. It is a union, i.e. a minimal superset, of the other concepticons that we have chosen to reconcile.

The PUC is (currently) based on the following other concepticons:

art-000 PanLem
art-012 Swadesh 207
art-245 Swadesh 100
art-257 LWT Code
art-260 Swadesh 200
art-261 SILCAWL
art-266 Swadesh-Gudschinsky 200
art-267 ALCAM 120
art-268 ABVD 210
art-269 ℤ
art-270 LEGO Concepticon
art-277 Swadesh-Yakhontov 110

The source that performs this reconciliation is art:Colowick, “A Union of Concepticons”, by Susan Colowick and Jonathan Pool.

In order to accommodate the 12 component concepticons, PUC has 3,545 expressions.

Future work

We may incorporate additional concepticons into PUC. Among those under consideration are Semantic Domains (art-292; 1,792 expressions) and DCMI Metadata Terms (art-301; 112 expressions).

Related work

In late 02016 another research group published the CLLD Concepticon (art-mul:CC), a major effort to map concepticons to each other. We have not yet examined that work to assess its potential applications in PanLex.