John Cowan cowan at
Sun Dec 18 19:47:24 CET 2016

On Sun, Dec 18, 2016 at 8:07 AM, Luc Pardon <lucp at> wrote:

    BCP47 violates that rule big time, by packing all kind of things
> (script, orthography, ...) in a field that was originally intended (in
> HTML) to contain only the language.

In fact, RFC 1766, the original incarnation of what is now BCP 47,
predates HTML and HTTP.  It was designed to have something
standard to put in the email header "Content-Language", whch was
designed to specify what language the email (as a whole) was
written in, and if it was being sent simultaneously in multiple
languages as a mixed/alternative email, to distinguish which
translation was which.  And even at that time it was understood
that language alone was too coarse-grained a category.

   Applied to the topic under discussion, "Alice in Spanglishland" would
> have to be tagged with "en" at the top of the document, and the Spanish
> words inside the text would have to be marked up separately (i.e. "<span
> lang="es">caramba</span> in HTML syntax). Or the other way around, if
> the majority of the words are Spanish.

Yes, that works well for vocabulary mixing simpliciter, but not so much
for more intimate language blends.  Consider the following bit of dog Latin:

Patres conscripti took a boat, and went to Philippi;
Boatum est upsettum, magno cum grandine venti.
Omnes drownderunt qui swim away non potuerunt.

The lovely word _drownderunt_ has an English root, an English inflection
_ed_ that has merged with the root (in regional English people say _drownd_
for _drown_ and either _drowned_ or _drownded_ for _drowned_), and a Latin
inflection _erunt_ on top of that.  How are we to mark this up? Similarly,
see the passages in my blog post "French in all its purity" at
Is the verb _bruncher_ in the code-switching example another like
_drownderunt_, or is it pure French?

Worse yet, what of

What is this that roareth thus?
Can it be a Motor Bus?
Yes, the smell and hideous hum
Indicat Motorem Bum!

where English _bus_, itself a clipping of Latin _omnibus_ 'for all', is
as a pseudo-Latin stem _b_ with a nominative ending, and then used in
the accusative as _bum_?  Neither spelling checkers nor text-to-speech
will recognize _b_ as English or _um_ as Latin.

(Pace Mark Davis, _downloaden_ is now clearly a German word, just like
_Standard, Tipp, Stopp, Rekord_.)

John Cowan        cowan at
        Is it not written, "That which is written, is written"?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Ietf-languages mailing list