Unicode 7.0.0, (combining) Hamza Above, and normalization for comparison

Wed Aug 6 14:01:29 CEST 2014

At 07:03 06/08/2014, Patrik Fältström wrote:
>To be honest, I do not think it matters where it is discussed.

I suggest we keep it discussed here. The reason why is the ICANN 
response to the plaintiffs in the .ir, etc. case. "the DNS provides a 
human interface to the internet protocol addressing system". This 
seems to be a good definition to commonly sustain as it is 
technically true, easy to understand, and makes a clear distinction 
between the human and the non-human issues.

The most complex issue of the human confusability of the ISO 10646 
code points calls for a visual to binary anti-phishing algorithm. 
Such an algorithm should be added to the idna table allowing 
registries to accept xn-- registrations or not, based upon the domain 
names already registered.

To start the debate on this issue I would suggest a possibilty for 
such an algorithm: a mathematical proximity confusability 
discrimination between character 32x32 rasterizations (i.e. 1024 bits 
structured strings). I note that this also implies a common font of 
reference: I do not think this is a problem as it is on the human 
side and that conflicts will be subject to courts: what counts is the 
font local law will consider. Up to each ccTLD to provide that 
information and to have it added to ISO 3106, which already includes 
the administrative languages we should get renamed anyway as 
standardization languages coupled with the accepted script(s).

Initial question:
1. what is the URL of the complete Unicode code point table value/description?
2. I found rasterisations made for different scripts but not for all.

jfc