ID for language-invariant strings

Fri Mar 14 19:41:35 CET 2008

> From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-
> bounces at alvestrand.no] On Behalf Of John Cowan

> > I don't know why 'und' wouldn't work for you. It signifies an
> undetermined /
> > indeterminate language,
>
> I don't see what's undefined or indeterminate about the language of
> reference names.  The reference name "English" for the language tagged
> 'en' or 'eng' is English by origin and non-linguistic by use.

I'm going to go a little wild for a moment with an example to illustrate what I see as a problem with "zxx".

Consider for a moment the large number of Unicode character names. Now, Unicode treats these as English-language character names, but just suppose they were considered language-neutral reference names. Now, there are scenarios in which these get presented to users, and as a result there are various kinds of linguistic processing that may be applicable (stemming, hyphenation...). If these were tagged with an ID such as "zxn" or "und", then there's no particular obstacle in developing an application with throwing those at linguistic processes, perhaps primed with some language-detection processing. But if they are tagged as "zxx", then you have to go out of your way to make sure that "zxx" gets ignored when these are thrown at those processes -- which may be in a linked library or in a service off in the cloud -- lest they simply return N/A.

Peter