[Suppress-Script] Initial list of 300 languages

Wed Mar 15 18:56:15 CET 2006

Ira McDonald (Musician / Software Architect)
Blue Roof Music / High North Inc
PO Box 221  Grand Marais, MI  49839
phone: +1-906-494-2434
email: imcdonald at sharplabs.com

Peter  Constable wrote:
>
> 
> Again, I don't know that nothing will be harmed, but I think we can
> distinguish between scenarios. If "en-US" was supported before, it
> should continue to be supported, and we don't want lots of 
> content to be
> broken in printing because people changed tagging to "en-Latn-US". But
> if someone is tagging content as "ga-Latg" and "ga-Latn" rather than
> "ga", they're probably doing so for a particular reason, and if that
> fails to print on certain printers, then we live with that. Those same
> printers - or, at least, those printing scenarios - are probably going
> to trash any attempt to print Arabic, Hindi or Thai no matter how it's
> tagged, so those are printing scenarios that I would be avoiding if I
> needed to work with such text, whether that means changing apps, OS,
> service provider or changing printers; if I was having 
> problems printing
> my Irish data, I'd probably do likewise.

OK - I want to reply in defense of printer manufacturers.

There have been network printers with Bidi support for Arabic and 
Hebrew in plaintext for years.

There have been Asian models of network printers from most of the
manufacturers with support for Thai and Hindi for years.

Tagging ANY datastream externally (in protocol) or internally
with an unnecessary script tag will always be a bad idea.

Despite all of our hot air on this mailing list, I confidently
expect that lots of fools will start generating 'en-Latn-US'
tags on web content by emulating some example they've seen
somewhere else on the web, because they won't know any better.

Printers tend to be developed by those computer programmers
who annoy Michael Everson - IPP as a protocol (and many others)
REQUIRES that all received protocol parameters be validated
FIRST for syntax and SECOND for content - when IPP/1.1 was
defined, a four-character script subtag in the second position 
was a syntax error.

Below is a verbatim quote from page 87 of IPP/1.1 (RFC2911):

4.1.8 'naturalLanguage'

   The 'naturalLanguage' attribute syntax is a standard identifier for a
   natural language and optionally a country.  The values for this
   syntax type are defined by RFC 1766 [RFC1766].  Though RFC 1766
   requires that the values be case-insensitive US-ASCII [ASCII], IPP
   requires all lower case to simplify comparing by IPP clients and
   Printer objects.  Examples include:

      'en':  for English
      'en-us': for US English
      'fr': for French
      'de':  for German

   The maximum length of 'naturalLanguage' values used to represent IPP
   attribute values is 63 octets.

I personally have seen code at Sharp and Xerox that specifically
checks for language and optional country subtags and treats any
other language-tag as a syntax error.

Cheers,
- Ira