[SPAM] RE: Points 3, 4 and 2 [RE: About: Tags for Identifying Languages (draft-phillips-langtags-01)]

Tue Mar 9 13:45:39 CET 2004

Hello Addison - thanks for your reply. I still think that there are some
holes or workarounds that will leave the 3066bis a bit messier than the
current 3066.

> See me interlinear comments below.
>
> Addison ...

See also my own original comments with >> at the beginning.

>> 3. In my view, it would also do well to allow inclusion of the
>>widely used LOCODEs to specify locations...
>> This would allow much easier specification of place in specifying
>> language
>> variants, e.g. for the Martha's Vineyard version of sign language, to
>> incorporate the LOCODE string
>> usmvy
>> within the language tag.
>
> We didn't consider LOCODEs in the design of draft-01. I haven't looked
> that
> closely at them.

Probably you should. They are widely used in many ICT applications.

> The M49 materials cover the immediate needs that Mark and
> I were dealing with.

But they don't help with more specific descriptions of language, such as
the example above, and indeed others that have been discussed on this list
over previous months/years.

> I don't personally care for LOCODE, which is a bit too specific for my
> tastes. It also incorporates all the problems that ISO3166 has (WRT
> stability and ambiguity) since it uses ISO3166 as a basis.

The rather stupdid reassignment by ISO 3166/MA doesn't invalidate the rest
of ISO 3166, which earlier RFCs use normatively.

Locodes has its own problems caused by the YU/CS cockup. But one option
that they won't have is of using 3-digit codes.

In addition, some of the software you refer to yourself below also won't
have the option of using 3-digit codes.

This area needs a lot further thought and discussion.

>> 4. In my view, it would also do well to refer to ISO 639-3 codes,
>> once that gets passed...
>> Will that be covered when, as you put it, the
>> draft RFC will "advance to the next stage of standardization (or be
>> revised so that it can so progress?"

> Not unless ISO639-3 advances before the draft does. It isn't realistic
> to incorporate ISO639-3 normatively before it exists...

Agreed entirely, but you should keep a watching brief on ISO 639-3 in my
view. I would expect that from its current stage, ISO 639-3 might advance
quite quickly (though Peter Constable's in a better position to comment on
that than me).

> All we can do now it provide support for it (which is what the
> whole -s-extlang stuff is about). When ISO639-3 is done or nearing
> completion, RFC3066:bis can itself be revised to incorporate
> ISO639-3 normatively.

In that case, why not work in tandem with ISO 639-3 development, otherwise
there will be the danger of even more inconsistencies being added? No
point in revising it twice.

> I certainly hope we're not still working on various 3066:bis
> drafts in a year when that takes place!

Why not, if it's possible that working in tandem may solve many of the
problems? Planned delays can avoid incompatibilities later. On a much
larger scale, see the way ISO/IEC 10646 and Unicode came together - that
wasn't always the case, and there could have been two conflicting codes if
heads weren't knocked together.

There's a similar danger with some of the suggestions in your langtags
document, though with fewer exceptions to consider.

> The newest draft includes UN M49 codes...
>>
>> That's true, though this is a workaround for what I raised in my point
>> 2:
>
> I don't see why M49's utility is reduced by that.

I do: special cases tend to require a later need for new rules, or new
registers, to work around several special cases, later on, typically.

>> 2. It covers the CS/CS problem well in dealing with ISO 3166 codes
>> (though naturally it would be better if the ISO 3166/MA didn't do
>> such stupid things - has anybody heard of top-level actions
>> regarding the allocation of the CS code in ISO 3166?)

Has anybody heard of any action on that? I've heard of no comments from
anyone on that.

>> There needs to be some rationale for why the code YU would not do, or
>> whether YU would do in certain instances. In the case of both YU and CS,
>> there are smaller entities that exist instead of the larger entities
>> which
>> those originally represented (as is also the case for SU).
>
> You can use YU if you want to. It might not be a good idea to do so (you
> may
> be offending someone), but it is permitted by rfc3066:bis.
> 'YU' is a code for a (defunct) country.

But your text uses the specific example:

             cs-CS (Czech for Czechoslovakia)

'CS' is a code for a (defunct) country.

I don't see why you make a difference between CS and YU.
Why is CS valid, and YU not?

> Numeric subtags from M49 are an option only in the case where ISO3166
> assigns a country an alpha-2 code that was previously assigned to another
> country. M49 is advertised as being stable and consistent, therefore Mark
> and I (with support from others on this list) incorporated it as a way of
> tagging content for countries that have the misfortune to get assigned a
> secondhand ID.

And what happens if, for example, Serbia and Montenegro split into two
sovereign states at some point in the future - not altogether unlikely.

Digits doesn't solve that problem or other potential future problems, it
merely disguises it for the present.

>> And how would people know when to use digit codes rather than
>> 2-letter codes?
>
> It's very clear in the draft: there will be an informative registration in
> the IANA registry.

So we have an IANA registry which will have to keep in step with the ISO
registries? Yet another problem area of versioning.

> In addition, the class of [a] super- and [b] sub-national codes
> from M49 can be used at any time.

M49 _doesn't_ use super- and sub-national codes. It has
[a] "macro geographical (continental) regions, geographical sub-regions,
and selected economic and other groupings" and
[b] country codes, for the countries.

In addition, there are a _few_ geographical entities (mainly island
groupings, associated with specific large countries) which have M49 codes.
These differ in treatment in the ISO 3166 model - such ones would have an
entity in ISO 3166-2 rather than in ISO 3166, which seems slightly to mx
apples and pears. And some of these also have TLDs.

The devil (as always) is in the detail.

> Otherwise you MUST use the ISO3166
> alpha2.
> This is comparable to requiring the use of the ISO639-1 alpha2 codes
> instead
> of the ISO639-2 alpha3 codes where they exist.

Superficially, but ISO 639-1 and ISO 639-2 have a Joint Advisory Committee
to keep them in tandem. M49 and the ISO 3166/MA have no such mechanism
written into the standards - it's just happenstance that they _mainly_ are
developed in tandem.

Both have some exceptions that always cause problems.

>> And is there any software etc which specifies using 2-letter codes,
>> which
>> would invalidate use of 3-digit codes?

> There is plenty of software that assumes that region codes all take the
> alpha2 form. This software will not be able to store the 3-digit code. But
> then, these applications won't be able to deal with reassignment of codes
> very well either (the reason for moving to M49).

So that could be a significant body of software which would not be able to
deal with part of the proposed new RFC3066bis?

Rather a problem, it seems to me. A simpler solution involving 2-letter
codes would be better.

> Really, RFC3066:bis makes all of this quite a bit easier to deal with.

But it adds further problems too.

In
> the past it was possible for there to be registrations (with lengths != 2)
> with regional meanings. And you can have (as with the sign language codes)
> two or more subtags with some kind of regional meaning.

Sorry - you lost me there. What do you mean? I hadn't spotted those
problems, but I'm happy to be enlightened with an example or two.

> RFC3066:bis does
> away with that. There are ISO3166 alpha2 codes and, in isolated cases, UN
> M49 codes. Registrations can be made that have "regional" meanings, but
> these will be limited to the "variant" slot in the tag.

Again a simpler solution would be better. Why not normatively specify ISO
3166 at a certain date, before the YU/CS cockup by the ISO 3166/MA?

That would remain simple, and avoid all the problems of exceptions, and of
exceptions to exceptions, described above.

John

--
John Clews,
Keytempo Limited (Information Management),
8 Avenue Rd, Harrogate,
HG2 7PG
Tel: +44 1423 888 432 (landline)
Tel: +44 7766 711 395 (mobile)
Email: scripts20 at uk2.net