Allowed characters (was: Re: Casefolding Sigma (was: Re: IDNAbis Preprocessing Draft)

John C Klensin klensin at jck.com
Wed Mar 26 17:54:24 CET 2008


--On Wednesday, 26 March, 2008 16:27 +0000 Michael Everson
<everson at evertype.com> wrote:

> Can I have a pointer to the current list of "allowed-for-IDN"
> and "disallowed-for-IDN" characters?

If you are looking for the _current_ (i.e., IDNA2003) list, you
need to work your way through RFC 3490, Nameprep (RFC 3491), and
Stringprep (RFC 3454) in order on a character-by-character
basis.  If any of the characters of interest have right-to-left
properties, you need to check the bidi rules for strings in RFC
3490 and to pay careful attention to the conformance statements
in that document.

You also need to figure out before you start what "allowed"
means, since "allowed to be used in queries" (and all of the
contexts that might generate queries) has a different answer
than "allowed to be stored in a DNS zone file without
information loss".  The latter is often (although perhaps not
exactly) expressed as a test on whether
ToUnicode(ToASCII(Unicode-string))=Unicode-string.  In
particular, compatibility characters and characters that
normally map to nothing may appear in queries but cannot be
represented directly in the ACE (punycode) form.

Most of us use online tools to make these tests, although
different people have different favority tools and there are no
guarantees that a given tool will be bug-free with regard to the
specification.


If your question is about the IDNA200X proposal, you need to
review draft-klensin-idnabis-issues to familiarize yourself with
concepts and terminology and then find the character lists and
the rules for applying and interpreting them in
draft-faltstrom-idnabis-tables.


    john





More information about the Idna-update mailing list