comments on the draft...

Tue Jun 8 02:05:45 CEST 2004

Hi Peter,

Thanks for the note. I've added these items to the issues list.

Some comments below... I've elided unnecessary text. All of your comments have a point and Mark and I will consider each of your suggestions before responding (and possibly creating a new draft). The one change not listed here (adding the hyphens in IANA-registered) has already been added to the editor's copy. The others require us to consider the wording of the text.

Addison

Addison P. Phillips
Director, Globalization Architecture
webMethods | Delivering Global Business Visibility
http://www.webMethods.com
Chair, W3C Internationalization (I18N) Working Group
Chair, W3C-I18N-WG, Web Services Task Force
http://www.w3.org/International

Internationalization is an architecture. 
It is not a feature.

> -----Original Message-----
> From: ietf-languages-bounces at alvestrand.no 
> [mailto:ietf-languages-bounces at alvestrand.no]On Behalf Of Peter Constable
> Sent: 2004年6月7日 14:13
> To: ietf-languages at alvestrand.no
> Subject: Some comments on the draft (was RE: New draft-langtags 
> (akaRFC3066bis) published...)
> 
> 
> Not a huge issue, but I still have doubts about registration of subtags
> rather than registration of entire tags. The supposed benefit is to
> reduce the number of registrations necessary, but does it really same
> much? If I registered "1904", the registration would still have to
> document that it can only be used with "de" and its combinations. If
> someone later wants to use "1904" for different semantics, e.g., a
> completely unrelated spelling reform for Martian, then all of the
> details still need to be spelled out (no pun intended) in a revision to
> the registration. And note that the details as to which *tags* are
> permitted is buried in the registration rather than having them
> enumerated in the directory, where it's more useful. I could see the
> value if we were looking at having to register tags that included script
> IDs one-by-one. But when we start getting into variants, I don't think
> we're really saving anything.
> 
I believe that registered subtags may be "validly" used in any context. Under the current scheme we don't always necessarily get all the potentially useful tags ("de-1904-NA", anyone?). 3066bis makes this problem a lot harder by adding new dimensions to the tags.

However, you appear to be saying that the generativity isn't your problem, but rather the "normativity" of the registration information. Perhaps we could have set up the subtags to be registered using a range pattern as shown above. Thus:

  en-*-boont   instead of -boont
  de-*-g1904  instead of -g1904

A completely general purpose subtag would have * as its introduction:

  *-somesubtag

One reason we didn't do this was because registered subtags are basically unimplemented by commercial software. You have to have the registry handy to process them and they tend to be minority dialects of languages that get registered in the first place. We strove to make implementation of registered subtags as easy as possible to make it more likely that they will be supported in as broad an array of software as possible. So "zh-Phst-AQ-boont" is a valid tag, but unless Phaistos-chiseling zeese-swilling Chinese speakers have taken up residence with the penguins, it is a meaningless tag. But no more meaningless than "es-RU" or "fr-MN" were before them, or the equally hilarious parent tag "zh-Phst-AQ".

For the record, I show a current list of potentially non-grandfathered variant subtags to be:

        iana_variants.add("gaulish");  // with cel-
        iana_variants.add("boont");  // with en-
        iana_variants.add("scouse"); // with en-
        iana_variants.add("guoyu"); // with zh-
        iana_variants.add("hakka"); // with zh-
        iana_variants.add("xiang");  // with zh-
        iana_variants.add("rozaj");  // with sl-
        iana_variants.add("nedis");  // with sl-

Do you really envision their generative use to be a problem? The basic rule should be: tag content wisely. The absurdities will take care of themselves, just as they always have.
> 
> 
> Section 2.2:
> 
> <quote>
>    o  ISO639-2 reserves for private use codes the range 'qaa' through
>       'qtz'. These codes should be used for non-registered language
>       subtags.
> </quote>
> 
> I still find this unclear. If I want to tag content as "Martian", can I
> use "qaa", or would it have to be "x-qaa", or can I use "x-martian"?
> (I'd suggest alternate wording, but I'm really not sure what is
> intended.) Same for the comparable paragraph in relation to ISO 15924.

We meant the first one (see the examples section at the end). But I note that you can use any of them, although the semantics differ:

  "qaa"
    -   is a private use primary language subtag that you intend to mean Martian. 
    -   can be used generatively. Witness 'qaa-AQ' or perhaps My Favourite Martian would use "qaa-Latn-US"
  "x-qaa"
    -   is a private use tag that you intend to mean Martian.
    -   the subtag qaa has no relationship to ISO 639
    -   can't be used generatively.
    -   my software shouldn't guess what it means
    -   I probably can't guess what it means either...
  "x-martian"
    -  is a private use tag that you intend to mean Martian
    -  my software shouldn't guess what it means
    -  I might guess what you mean...

  "martian"
    -  is a registered IANA primary language subtag. Really ISO639 should have been consulted and this never happen...

> 
> <quote>
>    o  All 2-character subtags following the primary subtag denote the
>       region or area to which this language variant relates, and are
>       interpreted according to assignments found in ISO 3166...
> </quote>
> 
> This makes clear what is the interpretation of the subtag. Regarding how
> the subtag affects the semantics of the tag as a whole, though, does it
> strike anyone else that "denote[s] the region or area to which this
> language variant relates" is kind of vague? For instance, given a tag
> (say) "fi-US", what would it mean to say that Finnish "relates to the
> US"? Surely we can be clearer about this. Perhaps the following:

I believe this text was in RFC 1766, let alone RFC 3066. Mark and I have avoided changing text that wasn't explicitly necessary to change.