Allowed characters (was: Re: Casefolding Sigma (was: Re: IDNAbis Preprocessing Draft)

Michael Everson everson at evertype.com
Sun Mar 30 13:23:15 CEST 2008


At 17:08 -0400 2008-03-26, John C Klensin wrote:

>  > I'm interested in the present and future, not 2003 restrictions.
>
>IDNA2003 is the present. And, had you said that, you could have 
>saved everyone a lot of time.

We work in very different domains, John, and we neither of us does 
well guessing about the other's expectations.

>The 10000 meter answer to that question is that, with the exception 
>of characters that are transformed by NFKC, all letters, digits, and 
>combining marks are permitted.

Splendid.

>That set of results is based on the Unicode property relationships 
>described in "tables" and is _exactly_ identical to the rules for 
>every other script.

No problem.

>Exceptions could be defined on top of those rules but, so far, there 
>are no specific exceptions for Arabic characters.  There is, in that 
>regard, nothing special about Arabic.

There will be at least two characters requested for inclusion as far 
as I know, based on discussions here in Dubai. (06FD and 06FE are 
used in Sindhi, the first for the word "in" 'and' and the second for 
the word "men" 'in'. Neither word has another spelling in 
Sindhi--unlike &/and.)

>We remain open to a strong argument from the users of the script 
>that some additional characters should be excluded at the protocol 
>level, not just the registry one.

Or included, as noted above.

>For example, I've seen at least one proposal to prohibit those 
>Qur'anic annotation characters.

Yes, we have discussed these.

>I hope that you and others who are at that meeting can help to focus 
>on those questions rather than than only on what is possible.

Of course. But I needed to know what is possible.
-- 
Michael Everson * http://www.evertype.com


More information about the Idna-update mailing list