[Suppress-Script] Initial list of 300 languages
imcdonald at sharplabs.com
Wed Mar 15 18:56:15 CET 2006
Ira McDonald (Musician / Software Architect)
Blue Roof Music / High North Inc
PO Box 221 Grand Marais, MI 49839
email: imcdonald at sharplabs.com
Peter Constable wrote:
> Again, I don't know that nothing will be harmed, but I think we can
> distinguish between scenarios. If "en-US" was supported before, it
> should continue to be supported, and we don't want lots of
> content to be
> broken in printing because people changed tagging to "en-Latn-US". But
> if someone is tagging content as "ga-Latg" and "ga-Latn" rather than
> "ga", they're probably doing so for a particular reason, and if that
> fails to print on certain printers, then we live with that. Those same
> printers - or, at least, those printing scenarios - are probably going
> to trash any attempt to print Arabic, Hindi or Thai no matter how it's
> tagged, so those are printing scenarios that I would be avoiding if I
> needed to work with such text, whether that means changing apps, OS,
> service provider or changing printers; if I was having
> problems printing
> my Irish data, I'd probably do likewise.
OK - I want to reply in defense of printer manufacturers.
There have been network printers with Bidi support for Arabic and
Hebrew in plaintext for years.
There have been Asian models of network printers from most of the
manufacturers with support for Thai and Hindi for years.
Tagging ANY datastream externally (in protocol) or internally
with an unnecessary script tag will always be a bad idea.
Despite all of our hot air on this mailing list, I confidently
expect that lots of fools will start generating 'en-Latn-US'
tags on web content by emulating some example they've seen
somewhere else on the web, because they won't know any better.
Printers tend to be developed by those computer programmers
who annoy Michael Everson - IPP as a protocol (and many others)
REQUIRES that all received protocol parameters be validated
FIRST for syntax and SECOND for content - when IPP/1.1 was
defined, a four-character script subtag in the second position
was a syntax error.
Below is a verbatim quote from page 87 of IPP/1.1 (RFC2911):
The 'naturalLanguage' attribute syntax is a standard identifier for a
natural language and optionally a country. The values for this
syntax type are defined by RFC 1766 [RFC1766]. Though RFC 1766
requires that the values be case-insensitive US-ASCII [ASCII], IPP
requires all lower case to simplify comparing by IPP clients and
Printer objects. Examples include:
'en': for English
'en-us': for US English
'fr': for French
'de': for German
The maximum length of 'naturalLanguage' values used to represent IPP
attribute values is 63 octets.
I personally have seen code at Sharp and Xerox that specifically
checks for language and optional country subtags and treats any
other language-tag as a syntax error.
More information about the Ietf-languages