Reviewing the character set model (was: Re: AW: Eszett)

Vint Cerf vint at google.com
Sun Jul 12 18:01:45 CEST 2009


I think it is vital to wind up the IDNABIS work in Stockholm.

I also agree that it would be very important to begin a serious  
examination of
the frameworks now in use with an eye to trying to develop a much more
general ability to absorb, e.g., the distinctions among language,  
scripts,
glyphs, protocol-wire representations, perhaps presentations, etc.

v

On Jul 12, 2009, at 11:54 AM, John C Klensin wrote:

>
>
> --On Sunday, July 12, 2009 09:21 -0400 Eric Brunner-Williams
> <ebw at abenaki.wabanaki.net> wrote:
>
>> ...
>>>> Clearly the correct form is a "u" positioned above, and
>>>> joining, an "o", the Wabenaki solution to the problem
>>>> presented by 16th century French lacking the requisite
>>>> character. If you'll all turn your Unicode hymn books to
>>>> U+0222 and U+0223 ...
>>>
>>> Of course, in their Unicode font rendering, someone would
>>> probably complain that both characters were confusable with
>>> the digit "8", but...
>>
>> In fact, the digit "8" was used in some Abenaki orthography,
>> along with the "o" "u" vertical ligature, during the hayday of
>> manual typewritters.
>
> I noticed that, in the pictures referred to from Michael's
> posting, I couldn't see the space at the top, hence making them
> indistinguishable.  If one had a manual typewriter designed for
> English or basic Latin and was writing words or sentences,
> putting an "8" in the middle of a word would actually be
> unambiguously OU. It is only with computers and the DNS that
> we've come to think of labels (or pseudo-words) with digits in
> the middle as reasonable and normal cases, further illustrating
> the observation that we have to be careful with analogies to
> "words" and orthographic assumptions in these efforts.
>
>> Back in the '03 work I discussed the Abenaki equivalence class
>> of {8, w,  ou, and U+0222, U+0223}, in the context of local
>> scope for zone file equivalence classes.
>>
>>> There are moments (but only extremely brief moments) when I
>>> think that maybe we should have taken RFC 5242 more seriously
>>> :-(
>>
>> It will never displace avian carrier. However, funny smiley
>> face _off_, when I recommended to the then-chair of the IRTF
>> circa 2002 (or earlier) that task E in rfc2130 be undertaken,
>> the response I got was "no".
>
> Based on recent discussions within the IAB --some of which have
> been highly critical of current approaches to character set use
> and coding generally (as well as of the basic IDNA strategy)-- I
> think that activity, or at least a follow-up workshop to
> reexamine strategies more than a dozen years later, are
> beginning to get some traction now.  The discussion in
> draft-iab-idn-encoding-00.txt is one sign of those discussions.
>
> But, IMO, we really need to get this work wrapped up rather than
> confusing it with another workshop, an RG, or very-long-term
> strategies.  That might not be true if someone, following recent
> patterns, wants to reopen the second-oldest question of all,
> which is whether an applications-based approach to IDNs with
> client-side mapping and an ACE in the DNS, is appropriate.  I
> hope we don't have to have that conversation again but,
> extending a recent argument, we haven't reviewed it any time
> recently and lots of things have changed since that decision was
> first made.
>
>    john
>
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update



More information about the Idna-update mailing list