Language tags in IPP (was: Re: [Suppress-Script] Initial list of 300 languages)

Mark Davis mark.davis at icu-project.org
Mon Mar 13 01:55:13 CET 2006


I'll go a bit further; the notion that a specification should 
communicate charset mapping by using language tags instead of charset 
tags is bizarre, and doomed to fail. Encountering a script tag is the 
least of the problems to expect.

Let me be clear; I think making the Suppress-Script data as accurate as 
possible is all to the good. But the sky won't fall if Suppress-Script 
is not set; we already clearly state that script and country information 
shouldn't be included unless it is necessary. I frankly don't see that 
people will add scripts to tags unless they are really necessary, in the 
small handfuls of cases like zh-Hant or az-Latn.

Doug Ewell wrote:
> Ira McDonald <imcdonald at sharplabs dot com> wrote:
>
>> For six years, almost all network printers have supported
>> IETF IPP/1.1 (RFC 2911), which DOES externally tag the
>> language of print data streams - the CUPS spooler that is
>> now ubiquitous in Linux distributions, standard in MacOS,
>> and common in commercial UNIX distributions uses IPP for
>> the print protocol.
>
> I took a look at RFC 2911.  Wow, 224 pages.  And to think people 
> criticized the initial-registry draft because the list was 106 pages 
> long and we wanted it to be an RFC.
>
>> From a cursory reading (I didn't have time to slurp in the whole thing), 
> it looks like IPP uses the language tag to determine the language in 
> which internal attributes are expressed and status messages are to be 
> issued.
>
> It also seems to have some interrelationship with the character set of 
> the print job, which seems wrong to me; figuring out which character 
> repertoires are necessary for which natural languages is a decidedly 
> non-trivial effort (ask Michael, who has done this work for the 
> European languages).
>
>> If you send your print job in Unicode (UTF-8 or UTF-16) to
>> your laser printer _and_ the printer has sufficient fonts
>> installed (for the necessary scripts), bad things won't
>> happen.  But if your print data is in a legacy charset
>> (like almost all existing documents in the world), then
>> bad things will begin to happen when unsupported 'script'
>> subtags are infixed in language tags.
>
> Again, I'm not a fan of the idea of determining character repertoires 
> on the basis of natural language.  And I'm disappointed if, today in 
> 2006, it is still safe to assume that "almost all existing documents 
> in the world" are not in UTF-8 or another Unicode character encoding.
>
> Section 4.1.2.3, item 2.b of RFC 2911 says:
>
> "the Associated Natural-Language parts match if the shorter of the two 
> meets the syntactic requirements of RFC 1766 [RFC1766] and matches 
> byte for byte with the longer.  For example, 'en' matches 'en', 
> 'en-us' and 'en-gb', but matches neither 'fr' nor 'e'."
>
> In other words, "en" will match "en-Latn-US", but "en-US" will not.  
> So the script subtag will not cause all language tags to break after 
> all, only in cases where both contain a region.
>
> This strongly suggests to me that when we are considering adding 
> Suppress-Script values for up to 300 languages, we should focus 
> primarily on those languages that are most likely to be used with a 
> region subtag, and spend much less time worrying about the rest.
>
> For example, it seems improbable to me that Hawaiian ("haw") exhibits 
> substantially different usage in different regions, such that tags of 
> the form "haw-US" or "haw-UM" would be likely to occur.  That means 
> --  as easy and uncontroversial as it would be -- we need not spend 
> time worrying about adding a Suppress-Script of "Latn" for Hawaiian.  
> It would be more productive to focus our attention on a language like 
> Santali, which is spoken in multiple regions, and for which a 
> "default" script assignment is not obvious.
>
> -- 
> Doug Ewell
> Fullerton, California, USA
> http://users.adelphia.net/~dewell/
>
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
>
>


More information about the Ietf-languages mailing list