doug at ewellic.org
Mon Dec 19 02:17:28 CET 2016
Luc Pardon wrote:
> BCP47 violates that rule big time, by packing all kind of things
> (script, orthography, ...) in a field that was originally intended (in
> HTML) to contain only the language.
John has already addressed the historical context -- that is, the world
did not begin with HTML.
Additionally, John alluded to the fact that there has always been a
sense among language tag users, going back to at least the 1980s, that
some critical language-tagging distinctions go beyond language alone.
"Simplified Chinese" and "Traditional Chinese" have always needed
different resource sets, different spell-checkers, different parameters
for searching and sorting. Canadian French, Swiss French, Belgian
French, and Hexagonal French have their differences. Twenty years ago
our translators in Quebec and Paris waged a mighty war over the use of
"taux" versus "niveau" to translate a "level" of laboratory control
Of course, once we replaced the one-off registration of language-region
and language-script pairs with a generative mechanism in BCP 47, we did
open the door for arbitrary combinations. But this is not a simple,
easily dismissed matter of putting unrelated items into a single field,
the way that (as David said) putting multiple language tags into a
single language-tag field would be.
> But you can tag the document as a whole with "lang=ru" and then
> proceed to tag the notes or citations separately as "lang=fr". Problem
> solved. Your search will bring up the annotated copy along with all
> the other Russian-language copies/editions of the book.
Spanglish isn't like this, though. Neither is Franglais or Tagalish or
Hinglish or the other combinations. One of the identifying features of
these hybrids is that vocabulary from each language is mixed so freely
that declaring one of the contributing languages to be the "base" and
the others to be exceptions, like "caramba" or "oy vey" in the middle of
an otherwise all-English text, both misses the point and requires
ridiculous amounts of markup.
> Or would it be enough if the Queen keeps saying "Caramba, off with
> their heads" to make it Spanglish, even if that is the only Spanish
> word in the entire book? Or would that then be "Englanish"?
See above. This would be English with a single Spanish word, which
normally wouldn't even be tagged. That's not at all what Spanglish and
Doug Ewell | Thornton, CO, US | ewellic.org
More information about the Ietf-languages