IDNAbis Goals

Mark Davis markdavis at
Mon Nov 27 18:53:48 CET 2006

One other restriction that I forgot.

B4. Disallow problematic sequences for normalization

   - These are listed
   - This is a relatively uncontroversial change, even as a hard
   restriction, since the sequences are degenerate -- not meaningful in any


On 11/27/06, Mark Davis <markdavis at> wrote:
> In order to assess the advantages and disadvantages of any approach, we
> need to have a good idea of the goals and the weights attached to them. Here
> is an initial take on some of the issues so far discussed, divided into
> categories.
> A. Loosen some restrictions on IDNA. The goal is to allow, *where
> feasible*, the same kind of expressive capability in other languages that
> is now provided for in English. It should be recognized that not all
> reasonable words of every language will qualify: even in English the lack of
> spaces and other punctuation forces compromises: words like "can't" are
> disallowed.
> Here is what I've heard so far:
>    1. Allow Unicode 5.0 characters
>    2. Provide for some mechanism for more quickly updating to
>    successive Unicode versions.
>    3. Allow for combining marks at the end of bidi fields
>    4. Allow for ZWJ/ZWNJ in limited contexts (see a previous message).
> Except for #4, which probably most people haven't looked through yet, it
> appears that these are mostly uncontroversial.
> B. Tighten some restrictions on IDNA. The purpose of this appears to be to
> reduce the opportunity for spoofing. Thus any proposed restrictions should
> be assessed against that metric. That is: (a) does the restriction reduce
> spoofing significantly? (b) Are there no other reasonable mechanisms for
> doing so?
> Here is what I've heard so far:
>    1. Remove (or discourage) symbols and (most) punctuation.
>       - This appears to be mostly uncontroversial. While the vast
>       majority of symbols and punctuation do not cause spoofing problems (I♥
>       is not a problem, for example), there is not enough value to having them to
>       be worth the effort.
>    2. Remove (or discourage) non-spacing marks.
>       - This is quite controversial. These marks are needed by many
>       languages; excluding them is like removing vowels from English: ""
>       becoming "".
>       - A very good case has to be made that they (a) cause
>       problems, and (b) those problems can't feasibly be handled with other
>       mechanisms.
>    3. Remove (or discourage) archaic / technical characters (characters
>    not in common modern use)
>    - Unicode supplies a proposed list of such characters, in
>       However, it is recognized that any such list will need refinement and
>       extension in the future.
>       - Certain scripts are quite clearly archaic, and could be
>       easily removed or discouraged.
>       - Judging whether a character in a modern script is archaic,
>       especially those in broad usage such as Latin, Arabic, and Cyrillic, can be
>       quite difficult -- often these characters are pressed into use in minority
>       languages.
> A major issue is the choice between removal and discouragement. Removal
> has the very significant cost of breaking backwards compatibility, so a
> clear case has to be made that there is no feasible alternative to handle
> spoofing problems that would otherwise occur.
> Mark

-------------- next part --------------
An HTML attachment was scrubbed...

More information about the Idna-update mailing list