RFC3066bis: looking ahead

Addison Phillips [wM] aphillips at webmethods.com
Tue Jan 20 20:23:43 CET 2004


Hi Peter,

Details interlinear response below.

>
> > The language subtags would be confused with alpha-2 or alpha-3 country
> > codes.
>
> See my reply to Mark on this point.

I didn't (and don't) understand how your response fixed this issue: two
kinds of alpha-3 code can appear in the second position, which is ambiguous
given that certainly SOME of these codes will be the same three letters.
>
>
> > I must admit that I'm a bit dubious about ISO639-3. You could view
> > RFC1766/3066 as a reaction to the inadequacies of ISO639-1/-2 in
> identifying
> > languages. Will ISO639-3 really solve these problems?
>
> There are various ways in which user needs go beyond ISO 639-1/-2. ISO
> 639-3 will solve some, but not others. Currently, there are user needs
> to tag XML elements or HTML content (etc.) in hundreds and thousands of
> languages not encompassed by ISO 639-1/-2.

That's great. I recognize the importance of cleaning up ISO639.

>(You might ask why, if that
> is the case, people don't register tags for these languages using the
> available process; the main reason is the scale of the issue.) ISO 639-3
> will solve that problem. Thus, I think innovations in RFC 3066 like
incorporation of ISO 15924
> (so script-based distinctions are defined without requiring
> registration) are needed, and I also see eventual incorporation of ISO
> 639-3 as a further step that is needed and that will be compatible with
> changes we're wanting to make now.
>
I agree that we should be forward looking. As I understand it, though,
ISO639-3 would not displace ISO639-2 or -1, so we need to have ways to
express all three kinds of tags, right?

>
>
> > If ISO639-3 identifies more variations within languages, then it would
> be,
> > in my view, a candidate for inclusion in the generative standard in
> the slot
> > Mark and I call "variant".
>
> No. It is not just a matter of defining variants for the things that
> exist now. It's a matter of defining a number of new things that weren't
> defined before.

I'm referring to a slot in the subtag pattern, not the function of the
subtag itself. The 'variant' subtag fits the pattern:

  lang-sublang

My illustration was to make clear that the order is deterministic, though.

> Sure, one might argue that Bisu is a variant of
> "Sino-Tibetan (other)", but I think it would be decidedly poor practice
> to establish tags like "sit-mbisu", "sit-TH-mbisu", "sit-Thai-TH-mbisu"
> to be contrasted with "sit-akhaa", "sit-lahushi", "sit-kaduo" plus a few
> hundred others (along with all of the region- or script-based
> derivatives). It's would be singularly unhelpful to users, and you'd
> better hope that you don't ever need to deal with something like
> orthographic variations as it would screw up your syntax if you had to
> resort to something like "sit-MY-Latn-jingpho-2008".

That's "sit-Latn-MY-jingpho-y2008"... the order is constant :-)

It would be poor practice to use such heavyweight tags, but not illegal, and
the fact that each subtag has intrinsic meaning would allow for matching
where it makes sense.

If what you're saying is that the ISO63-3 tag doesn't need the ISO639-1/-2
introduction, then why not just make it the first tag?
(jingpho-Latn-MY-y2008). I just don't see how 'sit-jingpho-Latn-MY-y2008' is
necessarily better than 'jingpho-Latn-MY-y2008' since you don't seem to feel
that the 'sit' subtag conveys any meaning. If ISO639-3 tags identify
languages, then we should fit them into RFC3066bis as language subtags.
>
>
> > Perhaps we could reserve a marker for such
> > inclusion, such as the formerly reserved subtag 'i-'.
> >
> > Your example would then be:
> >
> > zh-yue (grandfathered)
> > zh-i-yue (ISO639-3 flavored)
> > zh-Hant-CN-i-yue (ditto, with other slots filled in)
>
> I'm not sure I see how any of these alternates with "i-" are an
> improvement over what I suggested.

The 'i-' tag makes clear what's coming next is an ISO639-3 code and not to
be confused with an ISO3166 (or some other) code.

> I would firmly reject the third given
> our decision that language information should come first.

The whole tag is 'language information'. The primary language is the first
tag, followed by other distinguishing information. Dialects and orthographic
variations are distinguishing information, by that definition. If what
you're saying is "dialect trumps script", then we need a structure for that.
But...

I think what you're saying is that the ISO639-3 codes are really fully
formed language codes on their own and thus "should go first" and that at
least a subset of these codes have the obvious but inconvenient property of
identifying closely related languages (which gets us into the complex swamp
of deciding the difference between a language, a dialect, and so forth) and
thus merit greater structure in their corresponding RFC3066bis tags.

If so, we should, IMO, try to follow the design goals Mark and I had, which
include unambiguous parsing. The "i-" prefix could be mandated (required) to
come second:

zh-i-yue (legal)
zh-i-yue-Hant-CN (legal)
i-yue-Hant-CN (legal)
zh-Hant-CN-i-yue (WRONG)
zh-Hant-i-yue-CN (WRONG)

>
>
> > Alternatively, if ISO639-3 provides tags that actually identify
> langauges as
> > an amalgum of ISO639-1/-2, then why not:
> >
> > yue
> > yue-Hant-CN
>
> We could do that, but I was trying to consider the possibility that
> there is existing content in something like Yue that is already tagged
> using "zh".

It'll still be tagged 'zh' the day after 'zh-i-yue' (or whatever) is
allowed. Converting the language tags means converting the language tags,
regardless of how they are formed.
>
>
>
> > These thoughts of mine really are stream of conciousness. Part of my
> feeling
> > here is that ISO639-3 should do its work and offer a genuine
> improvement
> > over using the existing ISO639-x standards before RFC3066 gets revised
> to
> > include that work.
>
> Well, obviously we can't revise RFC 3066 to incorporate ISO 639-3 until
> the latter is published, which is probably about a year away.

As you say, we can make allowances now, but Mark and I had specific design
goals, one of which was unambiguous parsing.
>
>
Addison P. Phillips
Director, Globalization Architecture
webMethods | Delivering Global Business Visibility
http://www.webMethods.com
Chair, W3C Internationalization (I18N) Working Group
Chair, W3C-I18N-WG, Web Services Task Force
http://www.w3.org/International

Internationalization is an architecture.
It is not a feature.



More information about the Ietf-languages mailing list