Some comments on the draft (was RE: New draft-langtags (aka RFC3066bis) published...)

Mon Jun 7 23:13:27 CEST 2004

> From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-
> bounces at alvestrand.no] On Behalf Of Addison Phillips [wM]

> In addition, some of the confusion surrounding "extended
> language" tags has (hopefully) been removed. Peter Constable and John
Cowan's
> suggestion that we eliminate the singleton subtag "-s-" was included 

Very glad to see that.

Not a huge issue, but I still have doubts about registration of subtags
rather than registration of entire tags. The supposed benefit is to
reduce the number of registrations necessary, but does it really same
much? If I registered "1904", the registration would still have to
document that it can only be used with "de" and its combinations. If
someone later wants to use "1904" for different semantics, e.g., a
completely unrelated spelling reform for Martian, then all of the
details still need to be spelled out (no pun intended) in a revision to
the registration. And note that the details as to which *tags* are
permitted is buried in the registration rather than having them
enumerated in the directory, where it's more useful. I could see the
value if we were looking at having to register tags that included script
IDs one-by-one. But when we start getting into variants, I don't think
we're really saving anything.

Some comments on details in the draft: 

Section 2.2:

<quote>
   o  ISO639-2 reserves for private use codes the range 'qaa' through
      'qtz'. These codes should be used for non-registered language
      subtags.
</quote>

I still find this unclear. If I want to tag content as "Martian", can I
use "qaa", or would it have to be "x-qaa", or can I use "x-martian"?
(I'd suggest alternate wording, but I'm really not sure what is
intended.) Same for the comparable paragraph in relation to ISO 15924.

Next paragraph change "IANA registered primary..." to "IANA-registered
primary..."

<quote>
   o  All 2-character subtags following the primary subtag denote the
      region or area to which this language variant relates, and are
      interpreted according to assignments found in ISO 3166...
</quote>

This makes clear what is the interpretation of the subtag. Regarding how
the subtag affects the semantics of the tag as a whole, though, does it
strike anyone else that "denote[s] the region or area to which this
language variant relates" is kind of vague? For instance, given a tag
(say) "fi-US", what would it mean to say that Finnish "relates to the
US"? Surely we can be clearer about this. Perhaps the following:

<suggestion>
   o  All 2-character subtags following the primary subtag are
      interpreted according to assignments found in ISO 3166 alpha-2
      country codes from [4], assignments subsequently made by the ISO
      3166 maintenance agency, or governing standardization bodies. The 
      semantic effect of this subtag on the whole is to denote a
sub-variety 
      of the language in question used or usable in the region or area 
      specified by this subtag. 

   o  Typically, a sub-variety indicated using a region subtag refers to
a 
      regional dialect (spoken or written), or to a particular written
form, 
      such as regional spelling conventions. It may also specify that
content 
      is tailored for the needs of users in a given region even though
this
      may not necessarily corresponding with any single linguistically-
      identifiable dialect or writing conventions. For instance,
"fr-002" may 
      be used to indicate that a particular item of content (perhaps a
single
      string for a software user interface) is tailored to be suitable
for use 
      by Francophones throughout the continent of Africa without
necessarily
      implying that there is one French dialect common to all of Africa.
That 
      is, the declaration made by "fr-002" or any other tag containing a

      region subtag is a declaration about the given content, not about 
      linguistic or sociolinguistic realities in the world.
</suggestion>

All for the moment. Some further comments may be forthcoming.

Peter

Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division