Allowed characters (was: Re: Casefolding Sigma (was: Re: IDNAbis Preprocessing Draft)

John C Klensin klensin at
Wed Mar 26 18:57:05 CET 2008

--On Wednesday, 26 March, 2008 17:14 +0000 Michael Everson
<everson at> wrote:

> John,
> I was looking for a link with a tables with Yesses and Noes in
> it. But either way I don't have a link to the documents you
> are referring to.
> I am interested in RTL characters only at this point.

RFCs are at, e.g.,
for RFC 3490, use

The draft IDNA200X documents, from the draft charter Vint
circulated to this list earlier today, are

As you know, there are no major contemporary languages which are
written strictly right to left.  Arabic, for example, uses
numerals (whether expressed in European or Indo-Arabic digits)
that are written in the same order as numerals in Latin-derived
scripts.  At a coding level, Unicode's character-classification
system identifies some characters as direction-independent
(neither right to left or left to right); some of those
characters are natural candidates for inclusion in DNS labels.
This situation results in the need for so-called "bidi" rules in
Unicode and DNS-specific variations on those rules for IDNA2003
and proposed for IDNA200X.

You, yourself, commented some months ago that Farsi requires the
use of zero-width joiner (or non-joiner) characters to be
written correctly and the ZWJ and ZWNJ characters are
directionless.  They are discarded by IDNA2003 but are
candidates for being retained for some scripts in the IDNA200X
proposals (where "discarded" and "retained" refer to the
transition between the "allowed to be used in queries" and
"allowed to be stored" forms I referred to earlier.

best wishes,

