petercon at microsoft.com
Fri Jan 6 20:03:28 CET 2017
While I agree with most of your response to Michael, I still don’t think the “t” extension should be extended for this as it is not a transform, and I’m still not totally convinced that the correct solution _for a generalized case_ of code switching, extensive borrowing, or other contact-language scenarios is to capture everything in a single tag.
From: Ietf-languages [mailto:ietf-languages-bounces at alvestrand.no] On Behalf Of Mark Davis ??
Sent: Friday, January 6, 2017 9:08 AM
To: Michael Everson <everson at evertype.com>
Cc: ietflang IETF Languages Discussion <ietf-languages at iana.org>
Subject: Re: Spanglish
On Fri, Jan 6, 2017 at 1:54 PM, Michael Everson <everson at evertype.com<mailto:everson at evertype.com>> wrote:
I think there is no requirement for an extensible mechanism to “admix” any two random languages.
The value of a generative mechanism is that one doesn't have to anticipate ahead of time what users find useful. Take BCP47 itself. If it turns out that some one finds en-JP useful, then it can be represented without waiting for a registration process — which may not succeed just because a registrar finds "en-JP" not meaningful or useful.
The key for such a mechanism to work is that the semantics can be derived from the components (plus syntax).
Moreover, while es-spanglis is user-friendly (for the bibliographers and librarians who are likely the ONLY users of such tags), this es-t-h0-en is just a load of mysterious letters.
BCP47 is not intended to be for end users; that is nice where possible, but not a requirement. The syntax (limitation to 8 ascii-only alphanum) puts restrictions on readability; but BCP47 is meant for internal codes. Any good interface will supply a human readable name. end-user ought to see a human-readable name in their language. Pick a random point in the language subtag registry to get something like 'cmm'. What percentage of bibliographers know right away that that means "Michigamea"?
And it is NOT the ONLY user of such TAGS (as long as you are shouting), which you apparently didn't read from my Jan 5 email. We are not focused on the tagging of content side as much as the selection of content.
1) contact languages are NOT “transformations” so your -t- makes no sense
There is already discussion of this in the email.
2) what the heck is h-zero supposed to mean? Oh. hybrid zero. And there’s a hybrid one. And what’s that?
Nobody will know.
People will know who read the documentation.
> es-t-h0-en Spanglish Spanish with an admixture of English
> en-t-h0-es Spanglish English with an admixture of Spanish
> Note: the boundary between these two will be rather fuzzy, like most cases in identifying. We'd recommend that es-t-h0-en be used unless English clearly predominates.
No, we wouldn’t. We’d recommend that the dominant language, whether en or es, be used, whichever it may be.
The "unless predominates" is meant to signify the ~50% case. Unless you have a clause like that, content that is about 50:50 doesn't have a clear choice.
> One could then also have
> es-t-hi-h0-en Spanglish translated from Hindi
> A second key 'h1' is defined indicating that the source language for transform is a hybrid, much has we have done with the transliteration s0 and d0 keys. The value of h1 is a language tag that indicating that the source language for -t- is a hybrid with that language, allowing formulations like
> es-t-hi-h1-en Spanish translated from Hinglish
> es-t-hi-h0-en-h1-en Spanglish translated from Hinglish
You’ve got to be kidding.
This is clever, Mark, but it doesn’t address any actual user requirements, and the notation you propose is absurdly opaque.
Again, see opaqueness above.
> Hybrid locales
Locales? There’s no Spanglish locale envisioned.
By you. You don't happen to be the only user of BCP47.
> have intermixed content from 2 (or more) languages, often with one language's grammatical structure applied to words in another. See also https://en.oxforddictionaries.com/definition/spanglish for the use of the term “hybrid”.
> More importantly, it doesn't work for a very common use case: locale selection. To communicate requests for localized content and internationalization services, locales are used, which are an extension of language tags. When people pick a language from a menu, internally they are picking a locale (en-GB, es-419, etc). If you want an application to support Spanglish or Hinglish, then you have to have a locale to represent that.
I don’t think anybody wants to do this.
We have had concrete requests from product groups within Google for hybrid locale identifiers, and not just one or two of them. This is not a whim.
Note that this does not prevent 'spanglis' from being registered. If the use of an extension is too opaque for you, go for it. It just doesn't meet our needs.
> Luckily, this falls within the scope of the T extension.
Ietf-languages mailing list
Ietf-languages at alvestrand.no<mailto:Ietf-languages at alvestrand.no>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Ietf-languages