ID for language-invariant strings

Fri Mar 14 17:06:25 CET 2008

Peter Constable scripsit:

> For instance, suppose I need to apply language tags to each of the data
> elements in the main ISO 639-3 code table. For data in columns like
> the 639-3 ID, clearly "zxx" applies: the alpha-3 identifiers have no
> linguistic content.

Except when they do:  the tag "yue" is simply the Mandarin name for
Cantonese.  Nevertheless, you are right; it is used not *because* it is
Mandarin, but simply by fiat, the fiat of the RA.

> But what about the reference names? "zxx" would be a decidedly bad
> choice for that column, IMO, since every single data element is
> definitely linguistic in nature.

Linguistic in origin, but not in purpose: the names mostly look like
English, and many of them are in fact English in origin; but they are
not there because they are English, but (once again) by fiat of the RA.
So these names too are, according to my argument, non-linguistic: they are
in essence arbitrary tokens that happen to be mnemonic for anglophones.

To re-use an example I have given elsewhere:  "if time-of-day is equal to
1200, then move money to account" is English, if somewhat stilted English.
In its context of use, though, it is Cobol, and should be tagged "zxx",
not "en".

> I don't know why people are so adverse to new special-purpose code
> elements when there is a reasonable need. It's not like there are a lot
> of different special-case semantics that are needed in language-tagging
> application scenarios; I think the set is very small, perhaps even
> that this is the only important gap. I am *far* more concerned about
> overloading tags with distinct, orthogonal semantics for particular
> application scenarios ("und" means X in this application but Y in that
> application): *that* can lead to serious trouble.

The answer, in a word, is creeping featurism.  Saying "the set is very
small" is sheer unfounded speculation: every time someone tries to do
something new, another wannabe tag appears.

>          Note: for applications scenarios in which an identifier
>          string is unambiguously non-linguistic in nature, "zxx"
>          should be used rather than "zxn".

I think the examples above would tend to shake the notion that
"unambiguously non-linguistic" is a meaningful expression.

-- 
That you can cover for the plentiful            John Cowan
and often gaping errors, misconstruals,         http://www.ccil.org/~cowan
and disinformation in your posts                cowan at ccil.org
through sheer volume -- that is another
misconception.  --Mike to Peter