response to comments about extensions

Sat Jun 12 23:12:16 CEST 2004

Hi Peter,

At long last, here is a response to your comment: 

> Maybe there is a particular plan for the future that the authors have in
> mind, and perhaps a good plan. But I think it's only reasonable that
> there be at least some discussion of what kinds of future extensions
> might be considered appropriate, 

Please note that the draft (that is, -03) says:

<quot>
Extension subtags are those introduced by single-letter subtags other than 'x-'. They are reserved for the generation of identifiers which contain a language component, and are compatible with applications that process language tags according to this specification. For example, they might be used to define locale identifiers, which are generally based on language.

The structure and form of extensions are defined by this document so that implementations can be created that are forward compatible with applications that may be created using single-letter subtags in the future. In addition, defining a mechanism for maintaining single-letter subtags will lend to the stability of this document by reducing the likely need for future revisions or updates. 
</quot>

I think this is clear: the application must be compatible with processing language tags.

RFC 3066bis *fully* describes the range of permissable language tags. Unlike RFC 3066, it does so in a way that effectively limits future modifications to language tags without creating an incompatibility. There was in draft-02 only one gap that remained: the reserved singleton subtags. We assigned one ("-s") in a matter-of-fact manner and this provoked questions about them. The extensions work in this draft was designed to close up that gap.

So basically what we've done is describe language tags "for all time", absent a major revision to the document. And we've defined the mechanism for allocating and using the singleton subtags so that the rules in this document will remain unchanged or only mildly modified for an extended period of time. In other words, we feel that we've future proofed implementations that rely on this document.

Because the namespace of singleton codes is extremely limited (only 24 remain available), I think we envision that the IESG will be parsimonious in handing them out. The draft requires the RFC process. Presumably bizarre courtyard codes would not pass through that process unscathed.

By describing all of this in this draft, we enable parser writers to implement handling of all language tags (past, present, and future) and to know whether a tag is well-formed, whether it is in a canonical form, and how it compares to some other tag or range of tags. In this, I think the extensions mechanism is more akin to, say, surrogates in Unicode---we don't know what characters will be enshrined on the ethereal planes or what their properties will be, but it is reasonable to expect that they will be characters and we know how to access them. A language tagging example might be the LS639 debate on the list this (last) week: should Debbie Garside's time machine be correct and LC639 become a standard and if, for some reason, the language tag community feel that they should be included in 3066bis tags, this is how it will be done.

We are not trying to hide anything here. Although Mark and I have selfish purposes, we specifically wrote a general purpose mechanism that does not favor our or any particular application and for the sole purpose of spelling out how language tags will be allowed to evolve under this draft (and foreclosing other avenues to standardization absent a major revision).

I hope that addresses your concerns.

Best Regards,

Addison

Addison P. Phillips
Director, Globalization Architecture
webMethods | Delivering Global Business Visibility
http://www.webMethods.com
Chair, W3C Internationalization (I18N) Working Group
Chair, W3C-I18N-WG, Web Services Task Force
http://www.w3.org/International

Internationalization is an architecture. 
It is not a feature.

> -----Original Message-----
> From: ietf-languages-bounces at alvestrand.no 
> [mailto:ietf-languages-bounces at alvestrand.no]On Behalf Of Peter Constable
> Sent: 2004?6?8? 7:21
> To: ietf-languages at alvestrand.no
> Subject: comments on the draft - extensions
> 
> 
> My only remaining comments have to do with the Extensions mechanism
> (section 3.3). 
> 
> I'm struck by what seems to me to be unprecedented: In defining a
> protocol, mechanisms are incorporated in order to support some
> unspecified future higher-level protocols. In part, it strikes me as
> being sort of like UTC designating a range of codepoints (say)
> U+EF000..U+EFFFF for possible future "courtyard" characters with no hint
> of what these things might ever be -- and there's no question that UTC
> would not accept a proposal to do something like that without knowing
> that there was a specific plan and knowing what that plan was. So, it
> seems a bit unusual that we're being asked to adopt something for
> "applications that may be created... in the future."
> 
> But this is even more unusual because, in the Unicode analogy, the
> mechanism would have to be defined by UTC as they own the codespace,
> whereas here a higher level protocol could extend tags defined by RFC
> 3066bis without interfering with the latter's codespace, yet instead the
> mechanism is being put squarely in the codespace defined by RFC 3066bis.
> The only reason I can conceive for putting mechanisms here rather then,
> say, allowing a subsequent RFC defining a higher-level protocol that
> uses tags of the form (say)
> 
> language-tag "_" extensions
> 
> is to ensure that these future tags can be transported in any protocol
> that references RFC 3066bis. So, if someone managed to establish a
> protocol with some registered singleton subtag, say, "c" for a set of
> "courtyard codes" that might be used in DVB applications to perform a
> variety of text transformations (e.g. this text is English and should be
> displayed with a blue outline and flashing green/black fill), then
> *every* protocol that uses RFC 3066bis will need to permit these. 
> 
> Sure, not every protocol has to interpret them, and it would always be
> acceptable to ignore them. But it's an additional complexity, and an
> opportunity for people to do things wrong. Especially if one of the
> extensions that gets established has to do with something that
> implementers already find confusing, such as locales. It wouldn't be
> hard to imagine, for instance, people tagging content with something
> like "en-US-d-yyyymmdd" in places where it really isn't appropriate.
> 
> Maybe there is a particular plan for the future that the authors have in
> mind, and perhaps a good plan. But I think it's only reasonable that
> there be at least some discussion of what kinds of future extensions
> might be considered appropriate, whether there is any concern of
> inappropriate extensions getting created ("courtyard" codes?), and
> whether there's any concern over every RFC 3066bis consumer needing to
> accept whatever extensions might come along.
> 
> 
> 
> Peter Constable
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages