registry vs. extensions...

Mon Oct 20 17:15:33 CEST 2003

John opined:
>
> In addition, ISO will not register dialects at all, although as we
> know the line between "language" and "dialect" is frequently vague.
> (If a language is a dialect with an army and a navy, does that mean
> that Afrikaans is now a dialect of Xhosa?)

Not really my point. My point is that if we register "sl-rojaz" and then
ISO639 registers "rjz" to be Resian, making sl-rojaz deprecated, isn't that
messier than just asking that Han go and try ISO639 in the first place? If
they say "no" then the IANA registry is still an open option, with less risk
of a registered tag becoming deprecated (since we already have some
assurance that 639RA won't do something different).
>
> > So why not strive to get things right? If we had a duck test to
> > figure out whether ISO were inclined (or not) to register a
> particular value
> > we could save a lot of top-level registrations that quickly
> become outmoded.
>
> Unfortunately, ISO's process is not transparent.  Furthermore, ISO 639
> codes are for language _names_, not languages; there is no authoritative
> source for what is meant by a particular language name.

I know, but transparency would be enhanced, IMO, by making ISO639 do the job
a few times, eh? A glance at ISO639 turns up lots of gray zone about
dialects and languages in the existing tags. They have much worse problems
than RFC3066 because they have no structure at all to fall back on--it's
either a "language" or it isn't in their scheme. But if they were to accept
or reject some codes, then it would become clear which codes needed external
(3066) registration. Doing it the other way around (3066 first and then see
if 639 will create a code for them item, leads to deprecated codes---which
may have then had MORE codes registered on top of them, cf. sl-rojaz-xxxxx).
Presumably we then need to register a whole tranch of new dialect codes
appended to the new base-language code (and which themselves might fall
afoul of the ISO639 registry).

Gulp.

>
> > If I read it correctly, a search for "x-de-DE-mySubtag" does NOT find
> > "de-DE". "de-DE at mySubtag" would find such a match. It would be a way of
> > extending the generative mechanism for use with items that do
> not rise to
> > the level of general usage.
>
> Ah, I see.  You are right about x-de-DE-mySubtag, but wrong about
> de-DE at mysubtag, which is a) a syntax error, and b) matches only de,
> since @ is not a delimiter.

I was using @ to demonstrate my little idea... You could substitute x- for
it and get the same effect. I don't mean to suggest any particular format by
using @ in the examples. 'de-DE-x-mySubtag' is the same thing writ funny.
Either requires a new RFC.
>
> I think this is a genuine problem that could be fixed by allowing the
> x subtag at arbitrary points:  then de-DE-x-mySubtag would match
> de-DE in practice and would be acceptable in principle.

A syntax would need to be considered carefully. Other syntax might be more
appropriate to an extension mechanism (others have suggested alternatives to
me, such as URIs or properties). All I'm doing now is pointing out that this
set of registrations is potentially the narrow part of a very large wedge
indeed and some thought should be lavished on the topic.

Again, I'm not suggesting that Han's requests are ill-founded or that they
should be turned back. Only that the current setup could become chaotic if
more use of this sort is made of the registry.

Best Regards,

Addison

Addison P. Phillips
Director, Globalization Architecture
webMethods | Delivering Global Business Visibility
http://www.webMethods.com
Chair, W3C-I18N-WG, Web Services Task Force
http://www.w3.org/International/ws

Internationalization is an architecture.
It is not a feature.