Michael Everson everson at
Sat Jan 7 16:21:35 CET 2017

On 6 Jan 2017, at 17:08, Mark Davis ☕️ <mark at> wrote:

> ​The value of a generative mechanism is that one doesn't have to anticipate ahead of time what users find useful. Take ​BCP47 itself. If it turns out that some one finds en-JP useful, then it can be represented without waiting for a registration process — which may not succeed just because a registrar finds "en-JP" not meaningful or useful.

Yes, but en-JP could be quite obviously useful. Mix-matching any random pair of 8134 ISO 639-3 languages is not.

>> Moreover, while es-spanglis is user-friendly (for the bibliographers and librarians who are likely the ONLY users of such tags), this es-t-h0-en is just a load of mysterious letters.
> ​BCP47 is not intended to be for end users; that is nice where possible, but not a requirement. The syntax (limitation to 8 ascii-only alphanum) puts restrictions on readability; but BCP47 is meant for internal codes. Any good interface will supply a human readable name. end-user ought to see a human-readable name in their language. Pick a random point in the language subtag registry to get something like 'cmm'. What percentage of  bibliographers know right away that that means "Michigamea”?

People who work in that field will. We remember things like esp and eng and rus, now, don’t we? We don’t remember long hyphenated strings of t’s and u’s and h0’s and h1’. 

> And it is NOT the ONLY user of such TAGS (as long as you are shouting),

Pretend it was italics. 

> which you apparently didn't read from my Jan 5 email. We are not focused on the tagging of content side as much as the selection of content.

That may be, but more than one need can be served by this standard. You 

> 1) contact languages are NOT “transformations” so your -t- makes no sense
> ​There is already discussion of this in the email.

I don’t think you made a good case, and neither do some others, if I recall correctly.

> ​2) what the heck is h-zero supposed to mean? Oh. hybrid zero. And there’s a hybrid one. And what’s that?​
> Nobody will know.
> ​People will know who read the documentation. 

Super. Your own proposal doesn’t define them. 

> No, we wouldn’t. We’d recommend that the dominant language, whether en or es, be used, whichever it may be.
> ​The "unless predominates" is meant to signify the ~50% case. Unless you have a clause like that, content that is about 50:50 doesn't have a clear choice.

You realize, don’t you, that it’s actually impossible For anyone (user or otherwise) to estimate these percentages, right? From page to page or paragraph to paragraph or sentence to sentence this will invariably change. 

> > es-t-hi-h1-en Spanish translated from Hinglish
> > es-t-hi-h0-en-h1-en   Spanglish translated from Hinglish
> You’ve got to be kidding.
> This is clever, Mark, but it doesn’t address any actual user requirements, and the notation you propose is absurdly opaque.
> ​Again, see opaqueness above.

Right, so it’s both useless and opaque. 
> > Hybrid locales
> Locales? There’s no Spanglish locale envisioned.
> ​By you. You don't happen to be the only user of BCP47.

It would be practically impossible to generate such a locale, and I can thing of no customer base who would use it. Spanglish literature is creative and evocative and interesting to read. Nobody writes newspaper articles in it, and my stars it should be obvious that nobody wants to localize a user interface into it. In Spanish-based Spanglish, there is NO WAY EVER of guessing what elements will be substituted by English terms, phrases, or whatnot. I don’t believe that ANY of these contact-languages would have a community asking for a locale in it, rather than just English or Spanish or Nahuatl. 

> > have intermixed content from 2 (or more) languages, often with one language's grammatical structure applied to words in another. See also ​ for the use of the term “hybrid”.
> > More importantly, it doesn't work for a very common use case: locale selection. To communicate requests for localized content and internationalization services, locales are used, which are an extension of language tags. When people pick a language from a menu, internally they are picking a locale (en-GB, es-419, etc). If you want an application to support Spanglish or Hinglish, then you have to have a locale to represent that.
> I don’t think anybody wants to do this.
> We have had concrete requests from product groups within Google for hybrid locale identifiers, and not just one or two of them. This is not a whim.

Please be specific as to what has been requested, or what you say is just hearsay. I mean explain, in detail, what identifiers have been requested and what they are supposed to actually be like. Then look at a range of Spanglish texts. I don’t believe anybody is asking for locales written in such a language-form. For one thing, it’s totally unstandardized and impossible to generate a text that might be acceptable to every user of Spanglish. It’s the nature of the Mischung.

This is obvious linguistically. 

> Note that this does not prevent 'spanglis' from being registered. If the use of an extension is too opaque for you, go for it. It just doesn't meet our needs.

I don’t believe that your undefined “hybrid locales” have anything to do with the linguistic realities of contact-language. But until you describe it real linguistic terms, it’s not got anything to do with Spanglish.


More information about the Ietf-languages mailing list