lang ID for "*" (any language)

Mark Davis ☕ mark at macchiato.com
Wed Jun 13 19:47:58 CEST 2012


* doesn't work, because it isn't a valid language tag.

> The proposal here is for a subtag to encode the "elsewhere" condition:
"if there is no more specific language code, use 'zzz'".

We use 'und' to signal, in a query, that the language is unspecified and
should be filled in. It is not like 'und', in a query, is needed for any
other purpose. The addition of 'zzz' would just mean that we'd map it to
'und' in all processing, so it would not be a useful addition.


------------------------------
Mark <https://plus.google.com/114199149796022210033>
*
*
*— Il meglio è l’inimico del bene —*
**



On Wed, Jun 13, 2012 at 8:03 AM, Gordon P. Hemsley <gphemsley at gmail.com>wrote:

> FWIW, I think it is important to be able to make the distinction
> between "we don't *know* what the language is" ('und' for
> "Undetermined") and "we don't *care* what the language is" (the
> proposed 'zzz' for "Any").
>
> In my understanding of the matter, both CLDR and Java (as you've
> described them) are using 'und' for the appropriate purpose (though
> I'm not sure I necessarily agree with the fallback choice—an
> orthogonal issue, in any case).
>
> As for Google, it seems to me like *querying* for "any language" would
> be better off using the asterisk rather than a particular subtag. The
> proposal here is for a subtag to encode the "elsewhere" condition: "if
> there is no more specific language code, use 'zzz'".
>
> So I support Peter's proposal. I think the usecase he mentions could
> actually be a common one in localized software development.
>
> Gordon
>
> On Wed, Jun 13, 2012 at 10:52 AM, Mark Davis ☕ <mark at macchiato.com> wrote:
> > We use 'und' in CLDR when doing lookups, for example. The best patch for
> > "und-Cyrl" in the absence of other information is "ru-Cyrl-RU".
> >
> > Java also uses 'und' in the BCP47 way, but also as a "replace bad input"
> > (like FFFD for Unicode).
> >
> >
> http://download.java.net/jdk7/archive/b123/docs/api/java/util/Locale.html
> >
> > At Google, we used to try to distinguish between these different senses
> of
> > "unknown" vs "any", but found that people too often just mixed them up,
> so
> > we ended up just settling on a single subtag. It just has slightly
> different
> > nuances when used as a query vs used as a result (or content tag). But
> > that's the case anyway for locale/language matching.
> >
> > ________________________________
> > Mark
> >
> > — Il meglio è l’inimico del bene —
> >
> >
> >
> > On Wed, Jun 13, 2012 at 7:27 AM, Peter Constable <petercon at microsoft.com
> >
> > wrote:
> >>
> >> Thanks, Doug, for the reminder of that text, which is interesting.
> >>
> >> Root, which is totally unqualified--I.e., 'neutral'-is different. In a
> >> matching mechanism that seeks the best match against a preference list,
> a
> >> neutral resource might be chosen in the absence of any other matching
> >> resource. This could be used to qualify a resource as a positive match
> for
> >> any entry in the preference list if there isn't a stronger match for
> that
> >> entry.
> >>
> >> Mark, you mentioned using 'und' for some time. Has that been in private
> or
> >> public contexts? (We're looking at something that would be part of the
> >> Windows SDK.) And would you say the use was comparable to "root" (which
> I
> >> think is different)?
> >>
> >> Peter
> >>
> >> Sent from my Windows Phone
> >> ________________________________
> >> From: Doug Ewell
> >> Sent: 6/12/2012 5:15 PM
> >> To: ietf-languages at iana.org
> >> Subject: Re: lang ID for "*" (any language)
> >>
> >> I tend to agree with Mark that 'und' is the best choice for this.
> >>
> >> The passage in Section 4.1 seems to start off otherwise:
> >>
> >> "The 'und' (Undetermined) primary language subtag identifies linguistic
> >> content whose language is not determined.  This subtag SHOULD NOT be
> >> used unless a language tag is required and language information is not
> >> available or cannot be determined.  Omitting the language tag (where
> >> permitted) is preferred."
> >>
> >> but then goes on to give reasonable use cases:
> >>
> >> "The 'und' subtag might be useful for protocols that require a language
> >> tag to be provided or where a primary language subtag is required (such
> >> as in "und-Latn").  The 'und' subtag MAY also be useful when matching
> >> language tags in certain situations."
> >>
> >> On the list we've often talked about, for example, "und-Cyrl" to
> >> indicate text in the Cyrillic script. In a case like this, it might not
> >> be that the language cannot be determined, but that it doesn't matter.
> >>
> >> I think CLDR uses 'root' for a purpose similar to this.
> >>
> >> --
> >> Doug Ewell | Thornton, Colorado, USA
> >> http://www.ewellic.org | @DougEwell ­
> >>
> >> _______________________________________________
> >> Ietf-languages mailing list
> >> Ietf-languages at alvestrand.no
> >> http://www.alvestrand.no/mailman/listinfo/ietf-languages
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Ietf-languages mailing list
> >> Ietf-languages at alvestrand.no
> >> http://www.alvestrand.no/mailman/listinfo/ietf-languages
> >>
> >
> >
> > _______________________________________________
> > Ietf-languages mailing list
> > Ietf-languages at alvestrand.no
> > http://www.alvestrand.no/mailman/listinfo/ietf-languages
> >
>
>
>
> --
> Gordon P. Hemsley
> me at gphemsley.org
> http://gphemsley.org/http://gphemsley.org/blog/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/ietf-languages/attachments/20120613/e1659e70/attachment-0001.html>


More information about the Ietf-languages mailing list