Allowed characters (was: Re: Casefolding Sigma (was: Re: IDNAbis Preprocessing Draft)

John C Klensin klensin at
Wed Mar 26 20:47:40 CET 2008

--On Wednesday, 26 March, 2008 19:23 +0000 Michael Everson
<everson at> wrote:

> I really just wanted a list of
> Which Arabic letters were in and which were out.
> Which Arabic diacritics were in and which were out.
> Which punctuation and symbols in the Arabic block were in and
> which were out.
> Sorry if this is too complicated.


I think what both Mark and I are saying, albeit in very
different ways, is that it just isn't that simple.   Arabic (and
any other RTL script) requires consideration of sequences of
characters in labels, not just individual Yes/No character
lists.  In IDNA2003, there are some canonical form issues and, I
believe, some compatibility ones.  In general, for the current
state of the IDNA200X proposals, those issues translate into
disallowed code point (what you are calling "out", I think).

If people are interested in Arabic domain names (other uses of
Arabic script are not the subject matter of either this mailing
list or either set of protocols), you miss a major portion of
the picture if you restrict yourself to the Arabic script block
or specifically-Arabic letters and decorations.

So, while we could probably contrive to answer your precise
questions above, we would only be misleading you and your
audience by doing so.

And, for IDNA200X, some of the characters and relationships are
still under active consideration -- consideration in which some
of the participants in the meeting for which you presumably want
this information are very much participating and very well
informed as to the issues.


More information about the Idna-update mailing list