Luc Pardon lucp at
Mon Dec 19 09:32:10 CET 2016

On 19-12-16 02:17, Doug Ewell wrote:
> Luc Pardon wrote:
>> BCP47 violates that rule big time, by packing all kind of things
>> (script, orthography, ...) in a field that was originally intended (in
>> HTML) to contain only the language.
> John has already addressed the historical context -- that is, the world
> did not begin with HTML.

  Yes, I stand corrected on that. I always assumed that the language tag
designers had to make do with the existing "lang" attribute in HTML.
Wrongly so.

  However, that also means that there was an opportunity to add separate
fields for separate things, like script and orthography, and that
opportunity was missed. Too bad, says I with hindsight.

> Additionally, John alluded to the fact that there has always been a
> sense among language tag users, going back to at least the 1980s, that
> some critical language-tagging distinctions go beyond language alone.
> "Simplified Chinese" and "Traditional Chinese" have always needed
> different resource sets, different spell-checkers, different parameters
> for searching and sorting. Canadian French, Swiss French, Belgian
> French, and Hexagonal French have their differences. Twenty years ago
> our translators in Quebec and Paris waged a mighty war over the use of
> "taux" versus "niveau" to translate a "level" of laboratory control
> material.

   I'd say that all these examples have to do with "language alone", and
I am fine with things like "lang=fr-BE". The wars that are waged over
language issues are as unavoidable as taxes, death etc.

> Of course, once we replaced the one-off registration of language-region
> and language-script pairs with a generative mechanism in BCP 47, we did
> open the door for arbitrary combinations. But this is not a simple,
> easily dismissed matter of putting unrelated items into a single field,
> the way that (as David said) putting multiple language tags into a
> single language-tag field would be.

   While I have no (technical) issue with region tags, I do have an
issue with script. Script is - at least in my view - unrelated to
language. It is a way of rendering it, much like font type and font size
is, or printed versus handwritten, or audio versus text.

   If I have a text in the Greek language, I can write that down with
Greek characters, or I can transliterate it into Latin, or Russian, or
whatever. The language remains exactly the same Greek, written by the
same writer.

>> But you can tag the document as a whole with "lang=ru" and then
>> proceed to tag the notes or citations separately as "lang=fr". Problem
>> solved. Your search will bring up the annotated copy along with all
>> the other Russian-language copies/editions of the book.
> Spanglish isn't like this, though. Neither is Franglais or Tagalish or
> Hinglish or the other combinations. One of the identifying features of
> these hybrids is that vocabulary from each language is mixed so freely
> that declaring one of the contributing languages to be the "base" and
> the others to be exceptions, like "caramba" or "oy vey" in the middle of
> an otherwise all-English text, both misses the point and requires
> ridiculous amounts of markup.

   If that is indeed so (and I don't doubt it), then Spanglish should
probably get a separate language tag, or "en es" as far as I am concerned.

   But in the latter case it would be impossible to distinguish the
Spanglish, as used in Michael's edition of Alice, from other mixes of
Spanish and English, if there are any. Unless you start throwing in
region tags as well.

  That latter solution would also open the door to abuse. A publisher of
a book with basic Spanish expressions for use by English travelers would
then also be tempted to tag it as "en es". A text-to-speech engine would
not know which is which, making the book effectively useless. If the
sentences are tagged separately, the traveller can listen to it and get
the pronouncation right.

>> Or would it be enough if the Queen keeps saying "Caramba, off with
>> their heads" to make it Spanglish, even if that is the only Spanish
>> word in the entire book? Or would that then be "Englanish"?
> See above. This would be English with a single Spanish word, which
> normally wouldn't even be tagged. 

   It normally would _have_ to be tagged if you want your text to meet
the WCAG accessibility criteria, which are mandatory by law in several
countries and rising.

   I read that most mom & pop pizza restaurants never bothered to
provide access ramps for wheel chair users until they got sued.

   The "ridiculous amount of markup" argument is equivalent to the
"ridiculous amount of money" that some of them would have to spend to
build an access ramp or elevator. Valid, but irrelevant. If the law says
you must do it, you either comply or close the place.

> That's not at all what Spanglish and friends are.

   OK, no argument with that.

   Even so, the markup can be automated, so it shouldn't require a
ridiculous amount of work to produce that ridiculous amount of markup.

   And now for the good news: I'll be off-line for a few days <g>.


> -- 
> Doug Ewell | Thornton, CO, US |
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at

More information about the Ietf-languages mailing list