Peter Constable petercon at
Wed Jan 4 03:07:30 CET 2017

There’s no such thing as “code-switch languages” unless you mean the individual languages that a speaker switches between when they are code switching. Definition:

“Code switching is the practice of moving back and forth between two languages, or between two dialects or registers of the same language.”

As I mentioned in a post last week, the issue at hand in Michael’s scenario is that a reader of content needs to have some level of competency in both English and Spanish (or pick whatever combination of multiple languages) in order for the content to be understandable and relevant.

Neither the t or u extensions would be appropriate for this. A new “s” extension could be devised that has explicitly additive semantics: “en-s-es” means that both English _and_ Spanish are required to understand the content. Potentially, these could be chained, e.g. “en-s-s0-es-s1-fi” could mean that you need to speak English _and_ Spanish _and” Finnish to understand the content, with the ordering providing a prioritization (e.g., if your proficiency in Finnish more limited than Spanish, you may get by). But a key question is what is required of matching, since general-purpose matching algorithms likely won’t pay attention to extensions.


From: Ietf-languages [mailto:ietf-languages-bounces at] On Behalf Of Mark Davis ??
Sent: Monday, January 2, 2017 11:53 PM
To: Phillips, Addison <addison at>
Cc: ietflang IETF Languages Discussion <ietf-languages at>; John Cowan <cowan at>
Subject: Re: Spanglish

Also, John raises a concern about being able to express transformations into code-switch languages. I added a comment with a reformulation to address that:

As to John's concern in comment:1<> about being able to have a transformation of a code-switch language: I think that is a far less less important requirement than to have a general mechanism for code-switch languages.

However, I think we can accommodate that — and at the same time alleviate some of people's concerns about the terms 'source' and 'target' — by changing the syntax so that the value of the c0 key is the language that is mixed into the main language tag. We then get tags structured as follows:


Spanish with an admixture of English



English with an admixture of Spanish

Note: the boundary between these two will be rather fuzzy, like most cases with languages. Probably best for these to recommend that es-t-c0-en be used unless English clearly predominates.

One could then have

Spanglish translated from Hindi

Although it would be again quite infrequently used, we can easily allow for the case of a code-switch language being the source, and even have the translation of one code-switch language into another. We do this by using another keyword, much has we have done with the transliteration s0 and d0 keys. So we define c1 as a language that is mixed into the source language for -t-, allowing formulations like

Spanglish translated from Hinglish

The more I think about it, the more I like this formulation.


On Tue, Jan 3, 2017 at 8:15 AM, Mark Davis ☕️ <mark at<mailto:mark at>> wrote:
-u- is syntacticly unsuitable, as well as being a worse fit semantically. You can use es-t-en-c0 or es-t-en-gb-c0. You can't use es-u-en-c0, or es-u-en-gb-c0 because any two letter subtag is a reserved keyword.

I was not arguing in favor of using -u- extension for code-switch languages, just saying that it /is/ a broad mechanism.


On Mon, Jan 2, 2017 at 7:10 PM, Phillips, Addison <addison at<mailto:addison at>> wrote:
> > The much
> > more general mechanism is the U one, which by now has a variety of
> > different settings.
> Ah, yes, forgot about that. I think it would be much better then to use the U
> extension.

The U extension is for Locale information. I don't think that fits any better. If anything, it's a worse fit.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Ietf-languages mailing list