RFC3066bis: looking ahead

Tue Jan 20 19:45:43 CET 2004

> From: Addison Phillips [wM] [mailto:aphillips at webmethods.com]

> The language subtags would be confused with alpha-2 or alpha-3 country
> codes.

See my reply to Mark on this point.

> I must admit that I'm a bit dubious about ISO639-3. You could view
> RFC1766/3066 as a reaction to the inadequacies of ISO639-1/-2 in
identifying
> languages. Will ISO639-3 really solve these problems? 

There are various ways in which user needs go beyond ISO 639-1/-2. ISO
639-3 will solve some, but not others. Currently, there are user needs
to tag XML elements or HTML content (etc.) in hundreds and thousands of
languages not encompassed by ISO 639-1/-2. (You might ask why, if that
is the case, people don't register tags for these languages using the
available process; the main reason is the scale of the issue.) ISO 639-3
will solve that problem. But there are also needs to distinguish
orthographies or other categories related to but different from
language; ISO 639-3 will not solve those problems. 

Thus, I think innovations in RFC 3066 like incorporation of ISO 15924
(so script-based distinctions are defined without requiring
registration) are needed, and I also see eventual incorporation of ISO
639-3 as a further step that is needed and that will be compatible with
changes we're wanting to make now.

> If ISO639-3 identifies more variations within languages, then it would
be,
> in my view, a candidate for inclusion in the generative standard in
the slot
> Mark and I call "variant".

No. It is not just a matter of defining variants for the things that
exist now. It's a matter of defining a number of new things that weren't
defined before. Sure, one might argue that Bisu is a variant of
"Sino-Tibetan (other)", but I think it would be decidedly poor practice
to establish tags like "sit-mbisu", "sit-TH-mbisu", "sit-Thai-TH-mbisu"
to be contrasted with "sit-akhaa", "sit-lahushi", "sit-kaduo" plus a few
hundred others (along with all of the region- or script-based
derivatives). It's would be singularly unhelpful to users, and you'd
better hope that you don't ever need to deal with something like
orthographic variations as it would screw up your syntax if you had to
resort to something like "sit-MY-Latn-jingpho-2008".

> Perhaps we could reserve a marker for such
> inclusion, such as the formerly reserved subtag 'i-'.
> 
> Your example would then be:
> 
> zh-yue (grandfathered)
> zh-i-yue (ISO639-3 flavored)
> zh-Hant-CN-i-yue (ditto, with other slots filled in)

I'm not sure I see how any of these alternates with "i-" are an
improvement over what I suggested. I would firmly reject the third given
our decision that language information should come first.

> Alternatively, if ISO639-3 provides tags that actually identify
langauges as
> an amalgum of ISO639-1/-2, then why not:
> 
> yue
> yue-Hant-CN

We could do that, but I was trying to consider the possibility that
there is existing content in something like Yue that is already tagged
using "zh".

> These thoughts of mine really are stream of conciousness. Part of my
feeling
> here is that ISO639-3 should do its work and offer a genuine
improvement
> over using the existing ISO639-x standards before RFC3066 gets revised
to
> include that work.

Well, obviously we can't revise RFC 3066 to incorporate ISO 639-3 until
the latter is published, which is probably about a year away.

Peter

Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division