Question on ISO-639:1988

Jon Hanna jon at hackcraft.net
Fri May 21 16:22:05 CEST 2004


> ISO 639 (both parts) specify lower case language identifiers. This choice
> was made to avoid confusion with ISO 3166 (country code). From the point of
> view of ISO 639 an implementation of lower case only would be preferred.

Specify? I thought it specified case-insensitive language identifiers and
*recommended* that they be lower-case, hence when parsing you would have to be
prepared to receive them when upper case.

> I'm working with EAN.UCC, a standards community for the global Retail &
> Consumer Manufacturing industry.  In our XML messaging standards, we have
> specified ISO-639:1988 as the official code list for specifying language.  I
> see that your publication of the language codes specify 2 lowercase alpha
> characters.

The XML world almost exclusively uses RFC 3066 (XML itself, and just about every
XML application that needs to contain language information). In particular if
you have human-readable sections in the XML it can be confusing to have the
language of that stated using RFC 3066 through the xml:lang attribute, but
language information elsewhere transmitted through ISO 639. It would be
understandable for someone to see xml:lang="en-US" and to assume that they
could use that were the "en" was being in an ISO 639 specific element or
attribute.

As such I'd recommend you go with RFC 3066 unless you've a good reason not to.

> As a B2B trading community, should we be enforcing that only lower case
> characters be used?  Have you faced issues where a community implements both
> lowercase and uppercase characters?  In the case that both lowercase and
> uppercase are used, is there a preferred best practice between the two?

As Håvard says transmitting in lower case is preferred, but unless I'm mistaken
you should be prepared to receive upper case as well. Since the only characters
used are from the US-ASCII range case-folding for comparisons is trivial.

Even if 639 didn't allow upper case letters I'd recommend catching upper case
letters as an instance of the robustness principle.

-- 
Jon Hanna
<http://www.hackcraft.net/>
"
it has been truly said that hackers have even more words for
equipment failures than Yiddish has for obnoxious people." - jargon.txt


More information about the Ietf-languages mailing list