Review period; Nepali and Oriya
Doug Ewell
doug at ewellic.org
Sat Aug 4 22:05:10 CEST 2012
Gordon P. Hemsley wrote:
> I am of the late-night-and-not-well-thought-out opinion that further
> use of extlangs should be discouraged.
Though it may seem tempting, the question of whether the Registry should
treat Nepali and Oriya the same as Arabic and Chinese really isn't the
right place for a referendum on whether extlangs are good or bad, or
should or should not be discouraged. (This was one of the reasons the
Latvian discussion dragged on for months.)
The rationale in RFC 5645 for choosing certain languages, designated by
ISO 639-3 as macrolanguages, to host extlangs in BCP 47 was as follows:
"These macrolanguage subtags [initially Arabic, Chinese, Konkani, Malay,
Swahili, and Uzbek] were already present in the Language Subtag Registry
and were chosen because they were determined by the LTRU Working Group
to have been used to represent a single dominant language as well as the
macrolanguage as a whole."
There was no statement as to whether existing language subtags,
converted from individual to macrolanguage status by ISO 639-3, would
also fall into this category and be added to this list. It was perhaps
thought this would not be a frequent occurrence.
The questions here are (1) whether 'ne' has been used in BCP 47 contexts
to represent not only Nepali proper, but also Dotyali, and (2) whether
'or' has been used in BCP 47 contexts to represent not only Oriya
proper, but also Sambalpuri. The key factor is whether content in
Dotyali and Sambalpuri has been tagged as if it were Nepali and Oriya,
respectively, in the same way that various "Chinese" languages have been
tagged as 'zh'. It's a judgment call, and again, I'm not making any
recommendation one way or the other. But the goal is to apply the rules
equally to all languages that are in the same situation.
The decision to adopt an extlang mechanism into BCP 47 was heavily
debated, and the LTRU group literally took years to arrive at a
consensus. Deprecating this mechanism should be discussed as a separate
topic, not piggybacked onto a different topic. It should involve
rechartering the LTRU Working Group, and should result in a new RFC.
> AIUI, they are redundant registrations that are automatically
> deprecated (in some sense, if perhaps not in name) upon registration.
> The only purpose they seem to serve is to allow macrolanguage and
> microlanguage information to both be explicit in a single tag, and I'm
> not sure how useful that is.
Extlangs (and macrolanguages) exist because there is precedent and
current practice, not only in data or coding systems but also in
people's minds, to identify content as (say) "Chinese" even though it
may be Mandarin or Cantonese or Wu or Hakka or Min Nan or whatever. Some
processes need to see "Chinese" and others need to see "Mandarin."
Extlangs allow both to exist in the same tag.
> Is there any data available for current usecases of extlangs which
> don't involve legacy implementations? (I'm assuming that that was the
> primary motivation for including them in the spec. Correct me if I'm
> wrong.)
I doubt there is much useful data on the use of BCP 47 tags in the wild
at all. (You sometimes see comments that language tagging is so
haphazard that heuristic analysis yields better results, a bit of a slap
to those of us who have worked on language tagging for nearly a decade.)
But to answer your question, no, extlangs are not merely for legacy
implementations. People continue, and will continue, to regard (and tag)
Mandarin content sometimes as "Chinese" and sometimes as "Mandarin ."
Indeed, "legacy" implementations (from the RFC 1766 or 3066 days) won't
be able to parse extlangs anyway.
--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell
More information about the Ietf-languages
mailing list