ID for language-invariant strings
cowan at ccil.org
Fri Mar 14 17:06:25 CET 2008
Peter Constable scripsit:
> For instance, suppose I need to apply language tags to each of the data
> elements in the main ISO 639-3 code table. For data in columns like
> the 639-3 ID, clearly "zxx" applies: the alpha-3 identifiers have no
> linguistic content.
Except when they do: the tag "yue" is simply the Mandarin name for
Cantonese. Nevertheless, you are right; it is used not *because* it is
Mandarin, but simply by fiat, the fiat of the RA.
> But what about the reference names? "zxx" would be a decidedly bad
> choice for that column, IMO, since every single data element is
> definitely linguistic in nature.
Linguistic in origin, but not in purpose: the names mostly look like
English, and many of them are in fact English in origin; but they are
not there because they are English, but (once again) by fiat of the RA.
So these names too are, according to my argument, non-linguistic: they are
in essence arbitrary tokens that happen to be mnemonic for anglophones.
To re-use an example I have given elsewhere: "if time-of-day is equal to
1200, then move money to account" is English, if somewhat stilted English.
In its context of use, though, it is Cobol, and should be tagged "zxx",
> I don't know why people are so adverse to new special-purpose code
> elements when there is a reasonable need. It's not like there are a lot
> of different special-case semantics that are needed in language-tagging
> application scenarios; I think the set is very small, perhaps even
> that this is the only important gap. I am *far* more concerned about
> overloading tags with distinct, orthogonal semantics for particular
> application scenarios ("und" means X in this application but Y in that
> application): *that* can lead to serious trouble.
The answer, in a word, is creeping featurism. Saying "the set is very
small" is sheer unfounded speculation: every time someone tries to do
something new, another wannabe tag appears.
> Note: for applications scenarios in which an identifier
> string is unambiguously non-linguistic in nature, "zxx"
> should be used rather than "zxn".
I think the examples above would tend to shake the notion that
"unambiguously non-linguistic" is a meaningful expression.
That you can cover for the plentiful John Cowan
and often gaping errors, misconstruals, http://www.ccil.org/~cowan
and disinformation in your posts cowan at ccil.org
through sheer volume -- that is another
misconception. --Mike to Peter
More information about the Ietf-languages