Language Registrations needed for i-unknown and i-mixed

Addison Phillips [wM] aphillips at webmethods.com
Wed Jan 21 06:39:07 CET 2004


You should note that RFC3066 actually says the following about the lovely
'mul' and 'und' codes:

   5. You SHOULD NOT use the UND (Undetermined) code unless the protocol
      in use forces you to give a value for the language tag, even if
      the language is unknown.  Omitting the tag is preferred.

   6. You SHOULD NOT use the MUL (Multiple) tag if the protocol allows
      you to use multiple languages, as is the case for the Content-
      Language:  header.

In other words, the UND tag really should not be used (since it adds no real
information anyway), as François suggests.

The MUL tag is probably better to avoid too if possible, although I'm not
sure exactly how given your specific situation. The xml:lang tag can be
applied on many elements and multi-lingual texts are generally better served
when the tagging actually conveys the language at the closest level of
relationship. This doesn't help RSS feed management, though, which must be
managed on a level above that of individual elements.

Providing a <language> tag of 'mul' will solve your "broken feed" problem,
but you may experience other problems instead. You may still get systems
handling an unrecognized tag (like 'mul' or 'und') as if they were 'en'
(English). Or worse: it may effectively filter out your content, since the
receiving system may just treat that tag as a "language I don't recognize
which is not in my list of things I know what to do with"

I hope the above references help, even though I'm pretty sure they muddle
what was a clear response :-).

Best Regards,

Addison

Addison P. Phillips
Director, Globalization Architecture
webMethods | Delivering Global Business Visibility
http://www.webMethods.com
Chair, W3C Internationalization (I18N) Working Group
Chair, W3C-I18N-WG, Web Services Task Force
http://www.w3.org/International

Internationalization is an architecture.
It is not a feature.

> -----Original Message-----
> From: ietf-languages-bounces at alvestrand.no
> [mailto:ietf-languages-bounces at alvestrand.no]On Behalf Of Francois
> Yergeau
> Sent: mardi 20 janvier 2004 20:11
> To: 'Bob Wyman '; 'ietf-languages at alvestrand.no '
> Subject: RE: Language Registrations needed for i-unknown and i-mixed
>
>
> Bob Wyman wrote:
> >Our dilemma is that RSS appears to have been defined with the
> >assumption that all items in a feed would share a common language.
>
> The spec even says so. Bad.
>
> >This is a usually good assumption when RSS is being used
> >to syndicate the content of a blog being maintained by a
> >single person,
>
> Not even.  I know a blog in 4 languages.
>
> >Unfortunately, RSS V2.0 -- like many other protocols --
> >doesn't define item-level <language> tags...
>
> The XML spec does.  Put an xml:lang attribute on the <item> element.
> xml:lang is defined by the XML spec itself, pretty standard, no?
>
> >Now, clearly, we could define some new namespace
> >and create an item-level <language> tag of our own like
> >"<ps:language>". The difficulty with doing so is that
> >this private tag wouldn't achieve much more than wasting
> >bandwidth since no known news aggregator knows what to do
> >with it.
>
> Your problem will not really be solved without tagging the stuff, so you'd
> better get started and get aggregators to pick it up.  Use the standard
> xml:lang, though.
>
> >Our interface allows people to create subscriptions that
> >restrict the content that is scanned for them to only those
> >that are marked as being in some specific language.
>
> So you need tagging.  Go ahead.
>
> >In order to address the issue of "any language" subscriptions,
> >etc., I'm requesting that we be able to use "i-unknown" and/or
> >"i-mixed" when appropriate.
>
> "i-mixed" already exists, under another name: "mul".  That's specified as
> multiple languages by ISO 639-2, and therefore allowed by RFC 3066, and
> therefore usable in xml:lang.
>
> For "i-unknown" you have two choices:
>
> - remain silent (don't emit <language>).  There doesn't seem to be much
> practical difference between not saying anything and saying you
> don't know.
>
> - use "und", which means undetermined.
>
> >Alternative solutions would be welcomed.
>
> There you are :-)
>
> --
> François Yergeau
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages



More information about the Ietf-languages mailing list