Alternatives to Unicode (was: Re: FW: Your statement on Identifiers and Unicode 7.0.0)

John C Klensin klensin at jck.com
Thu Feb 5 19:37:58 CET 2015


(Subject changed to clarify that this is not really about either
the IAB statement or any of the issues that might or might not
exist with Unicode's handling of Arabic.  However, this will be
my last posting on this list on the subject because it has
little or nothing to do with IDNA either.)

--On Thursday, February 05, 2015 13:06 +0100 Jefsey
<jefsey at jefsey.com> wrote:

> At 21:33 04/02/2015, John C Klensin wrote:
>> As to "a non-confusagle Unigraph compatible table", I look
>> forward to seeing a serious and detailed proposal.  Many of us
>> believe the notion is impossible for reasons that have at
>> least as much to do with human perception as with writing
>> systems.
> 
> Dear John,
> The target is not a proposal but a CLASS "FL" operational
> algorithm based upon a table, to be tested and reported as per
> ICP-3.

As the first person to have posted a proposal for a CLASS-based
system for IDNs, when I read your suggestions, I don't believe
you understand the implications and limitations of that approach
even as well as I did then, much less as well as I do now.  In
particular, from an i18n standpoint, the many issues with
comparison and equality that have become major topics of
conversation in recent years (with the current issue about
non-decomposable code points being a fairly minor example)
suggest that a different Class would do almost nothing to
eliminate the need for character and string canonicalization
("preparation").  From a DNS technology perspective, the point
at which a different Label Type would be needed, not just a
different CLASS is not easily identified and described.  New
CLASSes have, in practice, proven hard to deploy broadly.  New
Label Types are much worse -- so much worse that one of the DNS
WGs approved a proposal to depreciate them entirely, eliminating
the possibility of any additional ones, a few years ago.

As to ICP-3, while I've got a personal copy that I just checked,
I can't even find it any more on ICANN's web site.  General
searches find a copy of Stuart Lynn's statement about it, but
the links in that statement that are supposed to point to the
document itself are all broken and a search for "ICP-3" on
ICANN's site turns up nothing.  I'll leave it to you and others
whether your citing a policy document about which the
organization that created the policy has apparently lost
interest is useful, but it feels a little misleading to me.

More important, I can see nothing in ICP-3, a discussion about a
unique DNS root, that could be used to support your statement
unless you are referring to the statement that starts "None of
this precludes experimentation done in a manner that does not
threaten the stability of name resolution in the authoritative
DNS....".   It does not appear to me that you are proposing an
experiment of that sort.  Perhaps I'm incorrect.

> We are ***not*** considering a writing system, but a printed
> sign system for a single purpose: ID/naming non confusability.

Ok.  There have been a number of attempts to design unambiguous
symbol systems for various purposes over the centuries.  They
are certainly feasible even though all of the successful ones of
which I'm aware have very small symbol repertoires.  They also
have little or nothing to do with internationalization.  The
thing that makes internationalization important, useful, and, by
the way, hard, involves trying to accommodate the huge diversity
in human languages, writing systems, and ways of forming
mnemonics.  If you are going to focus on a new symbol (or sign)
system, that may be entirely worthwhile, but it has nothing to
do with the present issues any more than attempts to solve
language and communications problems with invented languages
like Esperanto have to do with issues in communication and
writing in modern French.

> There are five layers involved.
> - an unchanged DNS use, operated in CLASS "FL" (Free/Libre)
> implying any DBMS being used by nameservers.

See above.   And no.  Because a new CLASS requires either
significant modifications to existing root servers or a new set
of root servers and many existing tools (including most or all
existing resolution libraries) seem to assume CLASS=IN,
"unchanged" is impossible and "any DBMS being used by
nameservers" is at best unrealistic.
 
> - the list of accepted character signs.
> - the non-confusable visualization of these characters.
> - the list of corresponding UNIGRAPH code points
> - a fringe to fringe punnycoding/decoding for the CLASS "FL"
> DNSLIB root zone.

While you might be able to apply the Punycode algorithm to the
code set for your signs, it would almost certainly be a bad
idea.  Punycode provides a reasonable, perhaps optimal, encoding
for a rather large (over 2&20) code set with certain local
compactness properties.   If you are going to have a repertoire
of non-confusable signs experience suggests that repertoire will
be limited to a few hundred signs or fewer (unless you assume
users will have special training in recognizing distinctions
among signs, which the above appears to reject).  There is
little to be gained and quite a lot of compactness to be lost
from using Punycode.  With a repertoire of that size, you'd
almost certainly be better off just numbering them.

Finally, if you were going to build and use a coding system like
that but still wanted to use the DNS, you don't need a new CLASS
or other specialized arrangements.  Just find an appropriate
subtree, put the labels in it coded however you like (I'd
recommend using only octets with the high bit turned on to avoid
the DNS's ideas about ASCII-range case matching) and assume that
the "any octets" discussion of RFC 2181 will provide you with
all the DNS facilities you need.   Or, if you conclude that you
next something ASCII-compatible, there are far less complex and
more compact approaches than the Punycode for small repertoires.

> Please explain where is a possible impossibility.

I was concerned about natural languages and associated writing
systems, not an artificially-chosen symbol system.

    john





More information about the Idna-update mailing list