ISO 639 and other language identifiers

Caoimhin O Donnaile
Tue, 7 May 2002 11:09:30 +0100 (BST)

> Perhaps it might not be too controversial if only the lower family-tree
> nodes (i.e., those corresponding to more recent points in historical
> reconstruction) were included, and not the higher-level (more
> chronologically distant) nodes.

Under the scheme I had in mind it would be necessary to include all
nodes, even speculative ones, so that you would have an unbroken chain
from the language back to the language family.  That way you would
know from the database that Hawaiian, for example, is a Polynesian
and thence Austronesian language, even if some of the intermediate
nodes in the chain are speculative.

So someone would be able to search the web for pages in Polynesian
languages at sites, for example, and the browser would find
pages in Hawaiian. The pages themselves would only be labelled as "HAWI"
or whatever, but the browser or the search engine would make the
connection using a cached copy of the hierarchy from the database, or
from a list of Polynesian language codes obtained by querying the
database. The only hierarchic information in the database record for
Hawaiian would be that its parent node is Marquesic (node 634 in the
current Ethnologue). The fact that it is Polynesian would have to be
worked out by following nodes in the chain, even though the nodes
themselves might be speculative.

Speculative nodes could be labelled as such in the database:

 "Warning. Speculative node. Liable to change.
  Use code with care or not at all."

But even if someone did use the code, sufficient archival information
would remain in the database to determine the nearest current node
which embraced it.

My proposed system breaks down a bit when language genetic relationships
 don't fit into a hierarchic tree structure, but I think that it is
better to have a hierarchic approximation than no system at all.

The politic processes associated with changing the database might need
a bit of thought too.