Language tags in IPP (was: Re: [Suppress-Script] Initial list
of 300 languages)
Mark Davis
mark.davis at icu-project.org
Mon Mar 13 01:55:13 CET 2006
I'll go a bit further; the notion that a specification should
communicate charset mapping by using language tags instead of charset
tags is bizarre, and doomed to fail. Encountering a script tag is the
least of the problems to expect.
Let me be clear; I think making the Suppress-Script data as accurate as
possible is all to the good. But the sky won't fall if Suppress-Script
is not set; we already clearly state that script and country information
shouldn't be included unless it is necessary. I frankly don't see that
people will add scripts to tags unless they are really necessary, in the
small handfuls of cases like zh-Hant or az-Latn.
Doug Ewell wrote:
> Ira McDonald <imcdonald at sharplabs dot com> wrote:
>
>> For six years, almost all network printers have supported
>> IETF IPP/1.1 (RFC 2911), which DOES externally tag the
>> language of print data streams - the CUPS spooler that is
>> now ubiquitous in Linux distributions, standard in MacOS,
>> and common in commercial UNIX distributions uses IPP for
>> the print protocol.
>
> I took a look at RFC 2911. Wow, 224 pages. And to think people
> criticized the initial-registry draft because the list was 106 pages
> long and we wanted it to be an RFC.
>
>> From a cursory reading (I didn't have time to slurp in the whole thing),
> it looks like IPP uses the language tag to determine the language in
> which internal attributes are expressed and status messages are to be
> issued.
>
> It also seems to have some interrelationship with the character set of
> the print job, which seems wrong to me; figuring out which character
> repertoires are necessary for which natural languages is a decidedly
> non-trivial effort (ask Michael, who has done this work for the
> European languages).
>
>> If you send your print job in Unicode (UTF-8 or UTF-16) to
>> your laser printer _and_ the printer has sufficient fonts
>> installed (for the necessary scripts), bad things won't
>> happen. But if your print data is in a legacy charset
>> (like almost all existing documents in the world), then
>> bad things will begin to happen when unsupported 'script'
>> subtags are infixed in language tags.
>
> Again, I'm not a fan of the idea of determining character repertoires
> on the basis of natural language. And I'm disappointed if, today in
> 2006, it is still safe to assume that "almost all existing documents
> in the world" are not in UTF-8 or another Unicode character encoding.
>
> Section 4.1.2.3, item 2.b of RFC 2911 says:
>
> "the Associated Natural-Language parts match if the shorter of the two
> meets the syntactic requirements of RFC 1766 [RFC1766] and matches
> byte for byte with the longer. For example, 'en' matches 'en',
> 'en-us' and 'en-gb', but matches neither 'fr' nor 'e'."
>
> In other words, "en" will match "en-Latn-US", but "en-US" will not.
> So the script subtag will not cause all language tags to break after
> all, only in cases where both contain a region.
>
> This strongly suggests to me that when we are considering adding
> Suppress-Script values for up to 300 languages, we should focus
> primarily on those languages that are most likely to be used with a
> region subtag, and spend much less time worrying about the rest.
>
> For example, it seems improbable to me that Hawaiian ("haw") exhibits
> substantially different usage in different regions, such that tags of
> the form "haw-US" or "haw-UM" would be likely to occur. That means
> -- as easy and uncontroversial as it would be -- we need not spend
> time worrying about adding a Suppress-Script of "Latn" for Hawaiian.
> It would be more productive to focus our attention on a language like
> Santali, which is spoken in multiple regions, and for which a
> "default" script assignment is not obvious.
>
> --
> Doug Ewell
> Fullerton, California, USA
> http://users.adelphia.net/~dewell/
>
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
>
>
More information about the Ietf-languages
mailing list