Spanglish

Peter Constable petercon at microsoft.com
Thu Jan 5 04:25:27 CET 2017


Thanks for reminding us that BCP 47 already speaks to this situation.

An issue may arise in differentiating between two languages both required for complete comprehension versus two languages as available alternatives. But apart from that, the basic point applies: multiple languages are required, and an optimal approach would include identifying each.


Peter

-----Original Message-----
From: Ietf-languages [mailto:ietf-languages-bounces at alvestrand.no] On Behalf Of Harald Alvestrand
Sent: Wednesday, January 4, 2017 12:36 AM
To: ietf-languages at alvestrand.no
Subject: Re: Spanglish

Drive-by comment:

Content that requires proficency in two languages needs to be tagged with two languages. From RFC 1766:

   The relationship between the tag and the information it relates to is
   defined by the standard describing the context in which it appears.
   So, this section can only give possible examples of its usage.

    -    For a single information object, it should be taken as the
         set of languages that is required for a complete
         comprehension of the complete object. Example: Simple text.

    -    For an aggregation of information objects, it should be taken
         as the set of languages used inside components of that
         aggregation.  Examples: Document stores and libraries.

    -    For information objects whose purpose in life is providing
         alternatives, it should be regarded as a hint that the
         material inside is provided in several languages, and that
         one has to inspect each of the alternatives in order to find
         its language or languages.  In this case, multiple languages
         need not mean that one needs to be multilingual to get
         complete understanding of the document. Example: MIME
         multipart/alternative.

This was deliberately intended to say "if you need multiple languages to understand this, it's tagged with multiple languages".

This language is preserved unchanged in RFC 5646.

If we have tagging mechanisms that don't allow us to put multiple languages on a single object, that's a problem with the tagging mechanism.


Den 04. jan. 2017 03:07, skrev Peter Constable:
> There’s no such thing as “code-switch languages” unless you mean the 
> individual languages that a speaker switches between when they are 
> code switching. Definition:
> 
>  
> 
> “Code switching is the practice of moving back and forth between two 
> languages, or between two dialects or registers of the same language.”
> 
>  
> 
> As I mentioned in a post last week, the issue at hand in Michael’s 
> scenario is that a reader of content needs to have some level of 
> competency in both English and Spanish (or pick whatever combination 
> of multiple languages) in order for the content to be understandable 
> and relevant.
> 
>  
> 
> Neither the t or u extensions would be appropriate for this. A new “s”
> extension could be devised that has explicitly additive semantics:
> “en-s-es” means that both English _/and/_ Spanish are required to 
> understand the content. Potentially, these could be chained, e.g.
> “en-s-s0-es-s1-fi” could mean that you need to speak English _/and/_ 
> Spanish _and” Finnish to understand the content, with the ordering 
> providing a prioritization (e.g., if your proficiency in Finnish more 
> limited than Spanish, you may get by). But a key question is what is 
> required of matching, since general-purpose matching algorithms likely 
> won’t pay attention to extensions.
> 
>  
> 
>  
> 
> Peter
> 
>  
> 
> *From:* Ietf-languages [mailto:ietf-languages-bounces at alvestrand.no] 
> *On Behalf Of *Mark Davis ??
> *Sent:* Monday, January 2, 2017 11:53 PM
> *To:* Phillips, Addison <addison at lab126.com>
> *Cc:* ietflang IETF Languages Discussion <ietf-languages at iana.org>; 
> John Cowan <cowan at ccil.org>
> *Subject:* Re: Spanglish
> 
>  
> 
> Also, John raises a concern about being able to express 
> transformations into code-switch languages. I added a comment with a 
> reformulation to address that:
> 
>  
> 
>     As to John's concern in comment:1
>     <http://unicode.org/cldr/trac/ticket/9956#comment:1> about being
>     able to have a transformation of a code-switch language: I think
>     that is a far less less important requirement than to have a general
>     mechanism for code-switch languages. 
> 
>     However, I think we can accommodate that — and at the same time
>     alleviate some of people's concerns about the terms 'source' and
>     'target' — by changing the syntax so that the /value/ of the c0 key
>     is the language that is mixed into the main language tag. We then
>     get tags structured as follows:
> 
>     es-t-*c0-en*
> 
>     	
> 
>     Spanglish
> 
>     	
> 
>     Spanish with an admixture of English
> 
>     en-t-*c0-es*
> 
>     	
> 
>     Spanglish
> 
>     	
> 
>     English with an admixture of Spanish
> 
>         /Note: the boundary between these two will be rather fuzzy, like
>         most cases with languages. Probably best for these to recommend
>         that es-t-c0-en be used unless English clearly predominates./
> 
>     One could then have
> 
>     es-t-hi-*c0-en*
> 
>     	
> 
>     Spanglish translated from Hindi
> 
>     Although it would be again quite infrequently used, we can easily
>     allow for the case of a code-switch language being the source, and
>     even have the translation of one code-switch language into another.
>     We do this by using another keyword, much has we have done with the
>     transliteration s0 and d0 keys. So we define c1 as a language that
>     is mixed into the source language for -t-, allowing formulations 
> like
> 
>     es-t-hi-*c0-en*-*c1-en*
> 
>     	
> 
>     Spanglish translated from Hinglish
> 
>  
> 
> The more I think about it, the more I like this formulation.
> 
> 
> Mark
> 
>  
> 
> On Tue, Jan 3, 2017 at 8:15 AM, Mark Davis ☕️ <mark at macchiato.com 
> <mailto:mark at macchiato.com>> wrote:
> 
>     -u- is syntacticly unsuitable, as well as being a worse fit
>     semantically. You can use es-t-en-c0 or es-t-en-gb-c0. You can't use
>     es-u-en-c0, or es-u-en-gb-c0 because any two letter subtag is a
>     reserved keyword. 
> 
>      
> 
>     I was not arguing in favor of using -u- extension for code-switch
>     languages, just saying that it /is/ a broad mechanism.
> 
> 
>     Mark
> 
>      
> 
>     On Mon, Jan 2, 2017 at 7:10 PM, Phillips, Addison
>     <addison at lab126.com <mailto:addison at lab126.com>> wrote:
> 
>         >
>         > > The much
>         > > more general mechanism is the U one, which by now has a variety of
>         > > different settings.
>         >
>         > Ah, yes, forgot about that. I think it would be much better then to use the U
>         > extension.
>         >
> 
>         The U extension is for Locale information. I don't think that
>         fits any better. If anything, it's a worse fit.
> 
>         Addison
> 
>      
> 
>  
> 
> 
> 
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
> 
_______________________________________________
Ietf-languages mailing list
Ietf-languages at alvestrand.no
http://www.alvestrand.no/mailman/listinfo/ietf-languages


More information about the Ietf-languages mailing list