IDNAbis discussion style, mappings, and (incidentally) Eszett

Patrik Fältström patrik at frobbit.se
Fri Nov 30 11:23:49 CET 2007


On 30 nov 2007, at 04.00, Erik van der Poel wrote:

> I admire the design team's desire to keep things simple and to avoid
> exceptions but with tongue in cheek I point out that Patrik's document
> does not really seem simple and appears to be a long list of
> exceptions (or exceptional rules). :-)

Understood. But if you look at the properties in the Unicode tables,  
those are not enough to find what codepoints are ok or not. And there  
are specific problems with some codepoints that is of one category,  
but when normalizing them they are normalized to a series of  
codepoints where not all of them are of same category. One example is U 
+0140.

U+0140 is Ll (Lowercase Letter) while U+00B7 is Po (Punctuation, Other).

U+00B7;MIDDLE DOT;Po;0;ON;;;;;N;;;;;
U+0140;LATIN SMALL LETTER L WITH MIDDLE DOT;Ll;0;L;<compat> 006C  
00B7;;;;N;;;013F;;013F

The first try that some of you remember was only based on the classes,  
but that just did not work.

Now we have managed to find a solution that only have 6 exceptions. I  
personally am happy with it :-)

    Patrik



More information about the Idna-update mailing list