IDNNever.txt

Michel Suignard michelsu at windows.microsoft.com
Mon Mar 5 18:22:18 CET 2007


Except for 3007 and 30FB, afaik, all the other characters mentioned
below are either mapped out or normalized out. It does not make much
sense to have allowed characters in a list which are removed by idna
nameprep.  I don't think anybody is arguing about allowing compatibility
characters. Entries from the th and pl registries are just mistakes. As
long as we are clear that compatibility characters can no more be part
of an input repertoire (or more exactly excluding characters that don't
respect NFKC(cp)=cp), using either NFC or NFKC does not really matter
and we better use the simpler NFC.

I had mentioned most of the mistakes below a while ago to the IANA/ICANN
staff, but apparently it is the responsibility of the original
submitters to do something about it.

Concerning 3007, and 30FB, there are already both in use in Japanese IDN
names according to JPRS, so although they both have issues from a
confusability issue, it could be problematic to remove them. So they are
for sure not good candidate for an IDNNver.txt content.

Michel
-----Original Message-----
From: idna-update-bounces at alvestrand.no
[mailto:idna-update-bounces at alvestrand.no] On Behalf Of Martin Duerst


># I don't know about U+0E33 character.

It has a compatibility decomposition into U+0E4D and U+0E32.
That means that nameprep2003 normalizes such sequences of characters
into U+0E33. That it turn may mean that the idea to say goodby
completely to NFKC (and go purely with NFC) may have to be
revisited.

>I found six characters which are not listed in IDNPermitted table but
>listed in IANA Registered IDN tables.
>
>jp-japanese.html     U+3007

This is the ideographic zero. There are a lot of Web sites
with names such as www2007.org. It would seem appropriate that
you could do something similar with ideographic numerals.

>jp-japanese.html     U+30fb

This is the (ideographic) middle dot. It wasn't allowed in XML
names in XML 1.0, but we received quite a bit of feedback that
that was a mistake.

>pl-greek.html       U+0390
>pl-greek.html       U+03b0

These are iota/ypsilon with dialytica and tonos.
Looking at some Web sites, this seems to be part of
standard Greek orthography, but I might be wrong.
And I don't know whether adding accents (tonos)
is a good practice or not for domain names.

>th-thai.html        U+002e
>th-thai.html        U+0e33

These are the two discussed above.

Regards,     Martin.

>jp-japanese table case, U+3007 and U+30FB are used in proper nouns in
>JAPAN.
>
>--
>Kazunori Fujiwara, JPRS
>_______________________________________________
>Idna-update mailing list
>Idna-update at alvestrand.no
>http://www.alvestrand.no/mailman/listinfo/idna-update


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp


_______________________________________________
Idna-update mailing list
Idna-update at alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update


More information about the Idna-update mailing list