Review period; Nepali and Oriya

Sat Aug 4 22:37:09 CEST 2012

On Sat, Aug 4, 2012 at 4:05 PM, Doug Ewell <doug at ewellic.org> wrote:
> Gordon P. Hemsley wrote:
>
>> I am of the late-night-and-not-well-thought-out opinion that further
>> use of extlangs should be discouraged.
>
>
> Though it may seem tempting, the question of whether the Registry should
> treat Nepali and Oriya the same as Arabic and Chinese really isn't the right
> place for a referendum on whether extlangs are good or bad, or should or
> should not be discouraged. (This was one of the reasons the Latvian
> discussion dragged on for months.)

Fair enough.

> The rationale in RFC 5645 for choosing certain languages, designated by ISO
> 639-3 as macrolanguages, to host extlangs in BCP 47 was as follows:
>
> "These macrolanguage subtags [initially Arabic, Chinese, Konkani, Malay,
> Swahili, and Uzbek] were already present in the Language Subtag Registry and
> were chosen because they were determined by the LTRU Working Group to have
> been used to represent a single dominant language as well as the
> macrolanguage as a whole."
>
> There was no statement as to whether existing language subtags, converted
> from individual to macrolanguage status by ISO 639-3, would also fall into
> this category and be added to this list. It was perhaps thought this would
> not be a frequent occurrence.
>
> The questions here are (1) whether 'ne' has been used in BCP 47 contexts to
> represent not only Nepali proper, but also Dotyali, and (2) whether 'or' has
> been used in BCP 47 contexts to represent not only Oriya proper, but also
> Sambalpuri. The key factor is whether content in Dotyali and Sambalpuri has
> been tagged as if it were Nepali and Oriya, respectively, in the same way
> that various "Chinese" languages have been tagged as 'zh'. It's a judgment
> call, and again, I'm not making any recommendation one way or the other. But
> the goal is to apply the rules equally to all languages that are in the same
> situation.

(FTW, the section specifying that extlangs can only be specified at
initial registration is 3.4, not 3.3.)

Given that CLDR tends to map ISO 639 macrolanguage codes onto a single
particular microlanguage, maybe we should synchronize with them? I
don't know how feasible (or logical) it would be, given that they
already have a lot more mappings than there are extlangs in the
registry, but it might be something to consider.

Of course, it might be that even such a decision is better left to a
later decision, given what else you say below.

> The decision to adopt an extlang mechanism into BCP 47 was heavily debated,
> and the LTRU group literally took years to arrive at a consensus.
> Deprecating this mechanism should be discussed as a separate topic, not
> piggybacked onto a different topic. It should involve rechartering the LTRU
> Working Group, and should result in a new RFC.

Understood.

>> AIUI, they are redundant registrations that are automatically
>> deprecated (in some sense, if perhaps not in name) upon registration.
>> The only purpose they seem to serve is to allow macrolanguage and
>> microlanguage information to both be explicit in a single tag, and I'm
>> not sure how useful that is.
>
> Extlangs (and macrolanguages) exist because there is precedent and current
> practice, not only in data or coding systems but also in people's minds, to
> identify content as (say) "Chinese" even though it may be Mandarin or
> Cantonese or Wu or Hakka or Min Nan or whatever. Some processes need to see
> "Chinese" and others need to see "Mandarin." Extlangs allow both to exist in
> the same tag.

I suppose that makes sense. I'd been viewing both extlangs and
macrolanguages as simply for backwards compatibility, but it does make
a lot of sense to also maintain codes for linguistic concepts that we
know are going to persist (like the ideas of "Chinese" and "Arabic").

In that case, it might be best to register them simply for
completeness's sake. Were there new macrolanguages that were
diliberately NOT registered as extlangs (past the original
registration)?

>> Is there any data available for current usecases of extlangs which
>> don't involve legacy implementations? (I'm assuming that that was the
>> primary motivation for including them in the spec. Correct me if I'm
>> wrong.)
>
> I doubt there is much useful data on the use of BCP 47 tags in the wild at
> all. (You sometimes see comments that language tagging is so haphazard that
> heuristic analysis yields better results, a bit of a slap to those of us who
> have worked on language tagging for nearly a decade.)

Indeed, 'tis a sad thing.

-- 
Gordon P. Hemsley
me at gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/