Reviewing the character set model (was: Re: AW: Eszett)

John C Klensin klensin at jck.com
Sun Jul 12 17:54:35 CEST 2009



--On Sunday, July 12, 2009 09:21 -0400 Eric Brunner-Williams
<ebw at abenaki.wabanaki.net> wrote:

>...      
>>> Clearly the correct form is a "u" positioned above, and
>>> joining, an "o", the Wabenaki solution to the problem
>>> presented by 16th century French lacking the requisite
>>> character. If you'll all turn your Unicode hymn books to
>>> U+0222 and U+0223 ...   
>> 
>> Of course, in their Unicode font rendering, someone would
>> probably complain that both characters were confusable with
>> the digit "8", but...  
> 
> In fact, the digit "8" was used in some Abenaki orthography,
> along with the "o" "u" vertical ligature, during the hayday of
> manual typewritters.

I noticed that, in the pictures referred to from Michael's
posting, I couldn't see the space at the top, hence making them
indistinguishable.  If one had a manual typewriter designed for
English or basic Latin and was writing words or sentences,
putting an "8" in the middle of a word would actually be
unambiguously OU. It is only with computers and the DNS that
we've come to think of labels (or pseudo-words) with digits in
the middle as reasonable and normal cases, further illustrating
the observation that we have to be careful with analogies to
"words" and orthographic assumptions in these efforts.
 
> Back in the '03 work I discussed the Abenaki equivalence class
> of {8, w,  ou, and U+0222, U+0223}, in the context of local
> scope for zone file equivalence classes.
> 
>> There are moments (but only extremely brief moments) when I
>> think that maybe we should have taken RFC 5242 more seriously
>> :-(
> 
> It will never displace avian carrier. However, funny smiley
> face _off_, when I recommended to the then-chair of the IRTF
> circa 2002 (or earlier) that task E in rfc2130 be undertaken,
> the response I got was "no".

Based on recent discussions within the IAB --some of which have
been highly critical of current approaches to character set use
and coding generally (as well as of the basic IDNA strategy)-- I
think that activity, or at least a follow-up workshop to
reexamine strategies more than a dozen years later, are
beginning to get some traction now.  The discussion in
draft-iab-idn-encoding-00.txt is one sign of those discussions.

But, IMO, we really need to get this work wrapped up rather than
confusing it with another workshop, an RG, or very-long-term
strategies.  That might not be true if someone, following recent
patterns, wants to reopen the second-oldest question of all,
which is whether an applications-based approach to IDNs with
client-side mapping and an ACE in the DNS, is appropriate.  I
hope we don't have to have that conversation again but,
extending a recent argument, we haven't reviewed it any time
recently and lots of things have changed since that decision was
first made.

    john




More information about the Idna-update mailing list