ID for language-invariant strings
randy_presuhn at mindspring.com
Mon Mar 17 20:33:21 CET 2008
> From: "Peter Constable" <petercon at microsoft.com>
> To: "Randy Presuhn" <randy_presuhn at mindspring.com>; <ietf-languages at iana.org>
> Sent: Sunday, March 16, 2008 1:53 AM
> Subject: RE: ID for language-invariant strings
> Randy, you haven't said anything since this initial comment.
> He's heard opinions for or against "zxx", "und" or a new tag.
> I'm curious to know what your thoughts are at this point.
Based on my understanding of the discussion so far, here's what I
think at the moment. (But remember, I'm not a developer in this
field - I'm an old linguist who does notwork management)
"und" seems wrong to me - it's not that we aren't able to figure out
what language this stuff is "in".
Private-use doesn't seem appropriate, given the ease with which folks
on this list have been able to produce use cases.
If we restrict ourselves to the current tags, "zxx" seems the best fit.
However, for most of the examples it seems disingenuous to claim the
data is not linguistic in nature. These are cases where we have stuff
that clearly *is* language in that in conveys meaning, but it doesn't entirely
"play by the rules" that apply to material that is *in* a particular
language. I think the examples of what does (or doesn't) happen to these
things when they are used in the context of highly inflected languages
is telling. They're clearly communicating *something*, and are thus
linguistic. However, the extent to which they are truly invariant
depends on the language context in which they appear. The Chinese
examples presented earlier point in one direction - the strings don't
even use the same script. However, according to a native-speaker
Russian friend, things like font names *would* get appropriate
endings in that language, so "zxx" would really not be right.
But we probably shouldn't let that example be the tail that wags the dog.
So, we've got this stuff. We know that it behaves like linguistic content.
We know it requires special treatment in translation processes, and that
that treatment often consists of it not really being "translated". In
some important set of cases it's truly invariant. In other cases it
is subject to some of the rules of its linguistic environment, but even
there it is handled more like names are handled. In short, this stuff
exists and is clearly different from the other sorts of things we tag.
Consequently, I think there is some merit to the idea of a new tag.
More information about the Ietf-languages