IDNAbis discussion style, mappings, and (incidentally) Eszett
Patrik Fältström
patrik at frobbit.se
Fri Nov 30 11:23:49 CET 2007
On 30 nov 2007, at 04.00, Erik van der Poel wrote:
> I admire the design team's desire to keep things simple and to avoid
> exceptions but with tongue in cheek I point out that Patrik's document
> does not really seem simple and appears to be a long list of
> exceptions (or exceptional rules). :-)
Understood. But if you look at the properties in the Unicode tables,
those are not enough to find what codepoints are ok or not. And there
are specific problems with some codepoints that is of one category,
but when normalizing them they are normalized to a series of
codepoints where not all of them are of same category. One example is U
+0140.
U+0140 is Ll (Lowercase Letter) while U+00B7 is Po (Punctuation, Other).
U+00B7;MIDDLE DOT;Po;0;ON;;;;;N;;;;;
U+0140;LATIN SMALL LETTER L WITH MIDDLE DOT;Ll;0;L;<compat> 006C
00B7;;;;N;;;013F;;013F
The first try that some of you remember was only based on the classes,
but that just did not work.
Now we have managed to find a solution that only have 6 exceptions. I
personally am happy with it :-)
Patrik
More information about the Idna-update
mailing list