IDNAbis Goals

Mark Davis markdavis at google.com
Mon Nov 27 18:53:48 CET 2006


One other restriction that I forgot.

B4. Disallow problematic sequences for normalization

   - These are listed
   http://www.unicode.org/reports/tr15/#Corrigendum_5_Sequences
   - This is a relatively uncontroversial change, even as a hard
   restriction, since the sequences are degenerate -- not meaningful in any
   language.

Mark

On 11/27/06, Mark Davis <markdavis at google.com> wrote:
>
> In order to assess the advantages and disadvantages of any approach, we
> need to have a good idea of the goals and the weights attached to them. Here
> is an initial take on some of the issues so far discussed, divided into
> categories.
>
> A. Loosen some restrictions on IDNA. The goal is to allow, *where
> feasible*, the same kind of expressive capability in other languages that
> is now provided for in English. It should be recognized that not all
> reasonable words of every language will qualify: even in English the lack of
> spaces and other punctuation forces compromises: words like "can't" are
> disallowed.
>
> Here is what I've heard so far:
>
>    1. Allow Unicode 5.0 characters
>    2. Provide for some mechanism for more quickly updating to
>    successive Unicode versions.
>    3. Allow for combining marks at the end of bidi fields
>    4. Allow for ZWJ/ZWNJ in limited contexts (see a previous message).
>
> Except for #4, which probably most people haven't looked through yet, it
> appears that these are mostly uncontroversial.
>
> B. Tighten some restrictions on IDNA. The purpose of this appears to be to
> reduce the opportunity for spoofing. Thus any proposed restrictions should
> be assessed against that metric. That is: (a) does the restriction reduce
> spoofing significantly? (b) Are there no other reasonable mechanisms for
> doing so?
>
> Here is what I've heard so far:
>
>    1. Remove (or discourage) symbols and (most) punctuation.
>       - This appears to be mostly uncontroversial. While the vast
>       majority of symbols and punctuation do not cause spoofing problems (I♥NY.com
>       is not a problem, for example), there is not enough value to having them to
>       be worth the effort.
>    2. Remove (or discourage) non-spacing marks.
>       - This is quite controversial. These marks are needed by many
>       languages; excluding them is like removing vowels from English: "microsoft.com"
>       becoming "mcrsft.cm".
>       - A very good case has to be made that they (a) cause
>       problems, and (b) those problems can't feasibly be handled with other
>       mechanisms.
>    3. Remove (or discourage) archaic / technical characters (characters
>    not in common modern use)
>    - Unicode supplies a proposed list of such characters, in
>       http://www.unicode.org/reports/tr39/#General_Security_Profile.
>       However, it is recognized that any such list will need refinement and
>       extension in the future.
>       - Certain scripts are quite clearly archaic, and could be
>       easily removed or discouraged.
>       - Judging whether a character in a modern script is archaic,
>       especially those in broad usage such as Latin, Arabic, and Cyrillic, can be
>       quite difficult -- often these characters are pressed into use in minority
>       languages.
>
> A major issue is the choice between removal and discouragement. Removal
> has the very significant cost of breaking backwards compatibility, so a
> clear case has to be made that there is no feasible alternative to handle
> spoofing problems that would otherwise occur.
>
> Mark




-- 
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20061127/d0c710ac/attachment.html


More information about the Idna-update mailing list