Reviewing the character set model

Sun Jul 12 22:45:18 CEST 2009

John C Klensin wrote:
> --On Sunday, July 12, 2009 09:21 -0400 Eric Brunner-Williams
> <ebw at abenaki.wabanaki.net> wrote:
>
>   
>> ...      
>>     
>>>> Clearly the correct form is a "u" positioned above, and
>>>> joining, an "o", the Wabenaki solution to the problem
>>>> presented by 16th century French lacking the requisite
>>>> character. If you'll all turn your Unicode hymn books to
>>>> U+0222 and U+0223 ...   
>>>>         
>>> Of course, in their Unicode font rendering, someone would
>>> probably complain that both characters were confusable with
>>> the digit "8", but...  
>>>       
>> In fact, the digit "8" was used in some Abenaki orthography,
>> along with the "o" "u" vertical ligature, during the hayday of
>> manual typewritters.
>>     
>
> I noticed that, in the pictures referred to from Michael's
> posting, I couldn't see the space at the top, hence making them
> indistinguishable.  If one had a manual typewriter designed for
> English or basic Latin and was writing words or sentences,
> putting an "8" in the middle of a word would actually be
> unambiguously OU. It is only with computers and the DNS that
> we've come to think of labels (or pseudo-words) with digits in
> the middle as reasonable and normal cases, further illustrating
> the observation that we have to be careful with analogies to
> "words" and orthographic assumptions in these efforts. 
>   

It is conventional in keyboarded Abenaki-for-Abenakis to use the numeral 
8 as a letter. The use of "w" indicates that the text is primarily 
intended for Anglophone, the "ou" for Francophone, readers for whom the 
"8" convention is unreasonable and not normal, and the text usually 
fragmentary and intermixed with English or French which forms the bulk 
of the corpus.

>> Back in the '03 work I discussed the Abenaki equivalence class
>> of {8, w,  ou, and U+0222, U+0223}, in the context of local
>> scope for zone file equivalence classes.
>>
>>     
>>> There are moments (but only extremely brief moments) when I
>>> think that maybe we should have taken RFC 5242 more seriously
>>> :-(
>>>       
>> It will never displace avian carrier. However, funny smiley
>> face _off_, when I recommended to the then-chair of the IRTF
>> circa 2002 (or earlier) that task E in rfc2130 be undertaken,
>> the response I got was "no".
>>     
>
> Based on recent discussions within the IAB --some of which have
>   

I thank my lucky stars that I'm not big enough to sit at the adults 
table. The answer I got was a bit more emphatic than "no".

> been highly critical of current approaches to character set use
> and coding generally (as well as of the basic IDNA strategy)-- I
> think that activity, or at least a follow-up workshop to
> reexamine strategies more than a dozen years later, are
> beginning to get some traction now.  The discussion in
> draft-iab-idn-encoding-00.txt is one sign of those discussions.
>   

I trust that it is somewhat obvious that what we are not short on is 
views from contributors who:

    o have little or no work experience with multi-byte encodings to 
support scripts other than Latin,
    o have little or no direct experience with local engineering 
solutions to 7bit restrictions in applications, and
    o assume that fundamental denial of service is a reasonable 
trade-off for a universal applicability statement.

I trust that it is also somewhat obvious that what we are short on, now 
and in 2000-2003, is views from contributors who _must_ use CJKV 
characters found in one or more of Big-5, GB18030, EUC-{JP,KR}, and 
Shift-JIS, and who work as engineers is primarily in support of correct 
application function where those encodings are pervasive, as well as 
views from contributors who are similarly engaged in Indic scripts, and 
Arabic-Persian scripts. The latter may have a view on the desirability 
of directionality leakage across label boundaries, a feature of our 
current set of compromises. I'm not going to assert that for Cherokee or 
Northern Syllabics or Indigenous languages of the Americas and the 
Pacific that use decorated Latin, or revived Mayan, there are 
fundamental issues, at least not this year. Maybe next year.
> But, IMO, we really need to get this work wrapped up rather than
> confusing it with another workshop, an RG, or very-long-term
> strategies.  That might not be true if someone, following recent
> patterns, wants to reopen the second-oldest question of all,
> which is whether an applications-based approach to IDNs with
> client-side mapping and an ACE in the DNS, is appropriate.  I
> hope we don't have to have that conversation again but,
> extending a recent argument, we haven't reviewed it any time
> recently and lots of things have changed since that decision was
> first made.
>   

John, if you think that's what I'm up to, don't be shy. I've the 
impression that I'd like brokenness to have a finite life, or some 
convergence property towards less demonstrable brokenness over time, and 
I'm not happy with the rinse-and-repeat we're currently having. We chose 
mapping-on-lookup at San Francisco. That mandates some issue revisiting, 
not a red-button-reset.

Eric