ID for language-invariant strings

Peter Constable petercon at microsoft.com
Tue Mar 18 17:03:28 CET 2008


> From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-
> bounces at alvestrand.no] On Behalf Of Doug Ewell


> > "und" means "undetermined". Not "cannot figure out what language this
> > stuff is in", not "cannot be determined", just "undetermined". That
> > is about as neutral as you can be.

...

> Here is the description of "special" code elements from the
> official ISO 639-3 site (http://www.sil.org/iso639-3/scope.asp#S):
>
> <quote>
> Special situations
>
> ... The identifier [und] (undetermined) is
> provided for those situations in which a language or languages must be
> indicated but the language cannot be identified...
> </quote>
>
> The relevant passage for our purposes is "the language cannot be
> identified."

The intent of the description on the ISO 639-3 site could be interpreted in three ways:

(i) the author cannot identify the language
(ii) the language cannot be identified by any body of experts
(iii) the reader is not able to assume any language identity

Now, the third reading appears to me to be the one Mark is assuming, but that you seem to consider outside the intent described on the ISO 639-3 site. But that still leaves both of the other two possible readings.

Just because an author could not identify the content doesn't imply that it is unidentifiable in principle. And who knows what was in the author's mind when they selected the tag: maybe they really considered the question and couldn't decide (which covers a range of scenarios from not having any clue to having a strong hunch but not being certain); maybe they quickly decided the language of the content wasn't of particular interest for their purpose (e.g. "it's not Latin, Cyrillic or Greek and so obviously not in an EU language") and didn't consider the question of language identity any further. And from there, it's only a small step to "for this purpose, no language identity is of particular interest, and so all these records will get tagged und", which leads to the third reading.

In other words, I don't think Mark's interpretation of "und" can be ruled out.


Peter


More information about the Ietf-languages mailing list