Montenegrin

Thu Jun 10 21:01:10 CEST 2010

Comments below.

Mark

— Il meglio è l’inimico del bene —

On Thu, Jun 10, 2010 at 07:14, Peter Constable <petercon at microsoft.com>wrote:

>  Since Montenegro decided to refer to their official language using a
> distinct name from that used by their neighbors, the question has come up on
> a few occasions as to whether “Montenegrin” should be coded in ISO 639
> distinct from Serbian. This raises various questions in my mind regarding
> implications of such a change, and I’m curious to know if people on this
> list have comments.
>
>
>
> In raising this, I’d ask people not to rat-hole on how different Serbian
> and Montenegrin are linguistically: there’s enough evidence that they can
> appropriate be considered a single language in terms of linguistic criteria,
> and so that would have no bearing whatsoever in possible JAC action.
>

I take it that you are saying that further evidence of their unity is
unnecessary for *this* decision, because it has been sufficiently
established. (And not that linguistic criteria in general have no bearing on
JAC actions.)

>
>
> Some questions that come to my mind:
>
>
>
> -          Given the established practice of coding “Bosnian”, “Croatian”
> and “Serbian” distinctly, how problematic would it be for users and
> implementers if “Montenegrin” was handled differently, simply being listed
> as one of the alternative names for sr / srp?
>

The separation of Bosnian, Croatian, and Serbian in the base language code
instead of using regions or variants to distinguish them already causes
problems. Thus we have had to develop mechanisms to deal with that. Adding
one more to the mix adds some incremental cost, but probably not huge.

The following comment is, however, extremely worrisome:

> But, in the case that we should agree that Montenegrin and Serbian are
linguistically equivalent, then we would also be obliged to recognize that
Bosniac and Serbian are linguistically equivalent. And so, we would be
obliged to conclude that Bosniac and Montenegrin MUST be treated on an equal
footing.

There are a few unfortunate cases in ISO 639 that break the principle that
separate codes are only supplied in cases of mutual incomprehensibility.
Saying those few exceptions establish a precedent, and that we must to allow
EVERY case of different dialects or orthographic variant to get separate
base language codes would be a complete disaster. Separating "American" from
"British" English, Brazilian from Portuguese, &c. in the base language code
would cause innumerable implementation and compatibility problems.

If the JAC were to (to take leave of their senses and) go down that path,
then I think our only real recourse in BCP 47 would be to decouple the base
language from automatically being driven from ISO 639 codes, and only take
in the codes that made sense.

>
>
> -          Will users really distinguish “Montenegrin” language from
> “Serbian” language when reading books, newspapers, etc.; when listening to
> radio, television, music, etc.; when buying dictionaries, hiring
> translators, etc?
>

I'll ask some knowledgeable people about that.

>
>
> -          Will librarians and other cataloguers really distinguish
> content in “Montenegrin” vs. “Serbian”?
>
>
>
> -          How will content developers that deal with localization be
> impacted? E.g., as developers of software or large websites, video media
> publishers dealing with alternate-language audio tracks or closed-caption
> content?
>
>
>
> -          In what ways would “sr-ME” versus “sr-RS” be less than fully
> adequate for users’ needs?
>

Here are a couple of the implementation issues, off the top of my head.

Option 1: Montenegrin continues to be represented by sr-ME.
Option 2: Montenegrin is represented by xxx (and because of past practice,
also by sr-ME).

Interoperability is harder for option 2, because in order to make processing
work right, you have to accept both coding systems from external sources and
deal with them consistently.

For translation of the language name, some systems go strictly by component.
So, for example, sr-Latn-ME would be translated as: "Serbian (Latin,
Montenegro)". For simple systems, it would be easier to have xxx.

However, CLDR and other libraries allow for the translation of multiple
pieces of BCP47 codes. That is, the translation for sr-Latn-ME can be
"Montenegrin (Latin)". [This facility of using context is necessary for
other purposes; you also want "zh-Hant-SG" to be "Chinese (Simplified,
Singapore)", using the term "Simplified" instead of "Simplified Chinese" or
"Simplified Han".]

Lookup and fallback become harder with a separate code. For
compatibility,you have to either normalize on input, or have an apparatus
set up so that xxx-* gets treated as if it were sr-*-ME-* for purposes of
lookup. For the latter, you also have to decide what the order is, eg

input:

xxx-Latn-XX-variant

try: // in some order, not sure what it really should be

xxx-Latn-XX-variant
xxx-Latn-XX
sr-Latn-ME-variant
sr-Latn-ME
xxx-Latn
xxx-XX
xxx-ME
sr-ME
sr-Latn
xxx
sr

There will inevitably be many implementations that don't realize the
connection between xxx and sr-ME. For those, lookup will probably just fail;
that is, requesting a website in xxx just gives, say, English, even if
Serbian is available.

>
>
>
>
>
>
> Peter
>
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/ietf-languages/attachments/20100610/213d1bcc/attachment-0001.html>