Peter Constable petercon at
Fri Jan 6 20:03:28 CET 2017


While I agree with most of your response to Michael, I still don’t think the “t” extension should be extended for this as it is not a transform, and I’m still not totally convinced that the correct solution _for a generalized case_ of code switching, extensive borrowing, or other contact-language scenarios is to capture everything in a single tag.


From: Ietf-languages [mailto:ietf-languages-bounces at] On Behalf Of Mark Davis ??
Sent: Friday, January 6, 2017 9:08 AM
To: Michael Everson <everson at>
Cc: ietflang IETF Languages Discussion <ietf-languages at>
Subject: Re: Spanglish


On Fri, Jan 6, 2017 at 1:54 PM, Michael Everson <everson at<mailto:everson at>> wrote:
I think there is no requirement for an extensible mechanism to “admix” any two random languages.

​The value of a generative mechanism is that one doesn't have to anticipate ahead of time what users find useful. Take ​BCP47 itself. If it turns out that some one finds en-JP useful, then it can be represented without waiting for a registration process — which may not succeed just because a registrar finds "en-JP" not meaningful or useful.

The key for such a mechanism to work is that the semantics can be derived from the components (plus syntax).

Moreover, while es-spanglis is user-friendly (for the bibliographers and librarians who are likely the ONLY users of such tags), this es-t-h0-en is just a load of mysterious letters.

​BCP47 is not intended to be for end users; that is nice where possible, but not a requirement. The syntax (limitation to 8 ascii-only alphanum) puts restrictions on readability; but BCP47 is meant for internal codes. Any good interface will supply a human readable name. end-user ought to see a human-readable name in their language. Pick a random point in the language subtag registry to get something like 'cmm'. What percentage of  bibliographers know right away that that means "Michigamea"?

And it is NOT the ONLY user of such TAGS (as long as you are shouting), which you apparently didn't read from my Jan 5 email. We are not focused on the tagging of content side as much as the selection of content.

1) contact languages are NOT “transformations” so your -t- makes no sense

​There is already discussion of this in the email.

2) what the heck is h-zero supposed to mean? Oh. hybrid zero. And there’s a hybrid one. And what’s that?​

Nobody will know.

​People will know who read the documentation.

> es-t-h0-en    Spanglish       Spanish with an admixture of English
> en-t-h0-es    Spanglish       English with an admixture of Spanish
> Note: the boundary between these two will be rather fuzzy, like most cases in identifying. We'd recommend that es-t-h0-en be used unless English clearly predominates.

No, we wouldn’t. We’d recommend that the dominant language, whether en or es, be used, whichever it may be.

​The "unless predominates" is meant to signify the ~50% case. Unless you have a clause like that, content that is about 50:50 doesn't have a clear choice.

> One could then also have
> es-t-hi-h0-en Spanglish translated from Hindi
> A second key 'h1' is defined indicating that the source language for transform is a hybrid, much has we have done with the transliteration s0 and d0 keys. The value of h1 is a language tag that indicating that the source language for -t- is a hybrid with that language, allowing formulations like
> es-t-hi-h1-en Spanish translated from Hinglish
> es-t-hi-h0-en-h1-en   Spanglish translated from Hinglish

You’ve got to be kidding.

This is clever, Mark, but it doesn’t address any actual user requirements, and the notation you propose is absurdly opaque.

​Again, see opaqueness above.

> Hybrid locales

Locales? There’s no Spanglish locale envisioned.

​By you. You don't happen to be the only user of BCP47.

> have intermixed content from 2 (or more) languages, often with one language's grammatical structure applied to words in another. See also ​ for the use of the term “hybrid”.

> More importantly, it doesn't work for a very common use case: locale selection. To communicate requests for localized content and internationalization services, locales are used, which are an extension of language tags. When people pick a language from a menu, internally they are picking a locale (en-GB, es-419, etc). If you want an application to support Spanglish or Hinglish, then you have to have a locale to represent that.

I don’t think anybody wants to do this.

We have had concrete requests from product groups within Google for hybrid locale identifiers, and not just one or two of them. This is not a whim.

Note that this does not prevent 'spanglis' from being registered. If the use of an extension is too opaque for you, go for it. It just doesn't meet our needs.

> Luckily, this falls within the scope of the T extension.

Not usefully.

Ietf-languages mailing list
Ietf-languages at<mailto:Ietf-languages at>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Ietf-languages mailing list