FW: Your statement on Identifiers and Unicode 7.0.0

Jefsey jefsey at jefsey.com
Mon Feb 9 17:59:48 CET 2015


OK,

thank you for your response. It shows that we do not seem to look for 
the same thing. What we want is to reduce the perfection of Unicode 
to the visual capacity of the reader. This is something equivalent to 
Einstein relativity vs. Quantum observables. Heidelberg in scripting: 
you cannot know for example both the character and the diaresis.

We know that when Unicode prints something it is 100% what was 
intended to be printed and 99,99% what should be printed (considering 
the updates to come). This is far too precise for the visual capacity 
of a regular person, moreover if he/she is not familiar with the 
script, as a foreign lawyer, a banker, an immigration officer, an 
optical reader, a domain names user, etc.

What we want is a no-phishing code where there will be a 100% proof 
reading capacity with a 30% low reading device/eye quality. We are 
not interested in the language associated to the script, but with the 
rough geometry of the graph - with a rustic, robust enough, 
legally/contractually enforcable geometric fount. No language/script 
is to be associated to not the graph - the context will tell which 
one it is. We are not interested in "o" being cyrillic, roman, or 
what ever : we are interested in middle size circle on the line. The 
context will tell us if a graph is roman, arabic, chinese, an upper 
or lower case, a caracter or a sign.

Therefore we expect to see a large addition of code-points to replace 
the combinations, and an attrition of the over-all needs. The way we 
plan to proceed is to get a 32x32 description of all the characters 
(in some country/for some languages there are official tables) and to 
start from there, using confusability algorithms. And trying to 
determine the true impact on readibility and the adaptations that 
should be made to increase the readibility, until we reach a 32x32 
universal machine oriented character graph set.

What is interesting is that we can probably play on the confusability 
algorithm to reduced the number of accepted points. This is research, 
we will see where it leads us. May be we will fail: then we can say 
that this form af man/machine interfacing could not work. Without 
trying, no one can tell.

But we know that IDNA will support it. And that all the CLASS "FL" 
zones will use the UNIGRAPH format (with ASCII being the same as part 
of the confusability algorithm specifications).

 From this, either character in each scripts can be linked to a 
UNIGRAPH graph and they can be used, or they cannot and cannot be 
accepted as non-confusable. Up to the local orthotypography to 
address the issue.

jfc

On 15:37 09/02/2015, John C Klensin said:


>--On Sunday, February 08, 2015 23:46 +0100 JFC Morfin
><jefsey at jefsey.com> wrote:
>
> >>> then even a wider selection of precomposed characters is
> >>> insufficient.
> >>
> >> This is something I do not understand.
> >
> > The best is to ask him.
> >
> > John, is it because you think this would result in too many
> > Unigraph code points, or for another reason we do not
> > see/understand?
>
>Jefsey (and others),
>
>The notion of a "no combining marks or other display guidance"
>unified character set has been explored several times, most
>notably (at least in my opinion) in the initial design for ISO
>10646, a design that evolved and was then ultimately replaced by
>a specification that is code point-identical with Unicode.  The
>explosion in the size of the code space is immense, especially
>since trying to do that probably requires avoiding the
>"unification" of the CJK and Arabic and Perso-Arabic script
>variations.
>
>If one also wants to make that hypothetical coding system
>glyph-sensitive without invisible or non-spacing characters that
>provide formatting clues and to eliminate the rather complex
>(and sometimes language-sensitive, rather than script-sensitive)
>rendering rules associated with several scripts in Unicode, the
>code space gets even larger, almost certainly beyond the 32 bits
>originally anticipated for 10646, much less the 16 planned for
>Unicode 1.0 and the circa 23 bits of present-day Unicode.
>
>I don't consider a code space that would require 40 or even 48
>bits per character (my rough guess) to really be plausible. In
>addition to sheer size, the encoding principles that would be
>needed to let people find characters and how they are coded
>would become quite daunting.   YMMD.
>
>Asmus and I discussed this in our exchange circa two weeks ago;
>I would encourage you to go back and review those notes if you
>are interested in pursuing this issue.
>
>   best,
>     john



More information about the Idna-update mailing list