IDNAbis Goals
Mark Davis
markdavis at google.com
Mon Nov 27 18:53:48 CET 2006
One other restriction that I forgot.
B4. Disallow problematic sequences for normalization
- These are listed
http://www.unicode.org/reports/tr15/#Corrigendum_5_Sequences
- This is a relatively uncontroversial change, even as a hard
restriction, since the sequences are degenerate -- not meaningful in any
language.
Mark
On 11/27/06, Mark Davis <markdavis at google.com> wrote:
>
> In order to assess the advantages and disadvantages of any approach, we
> need to have a good idea of the goals and the weights attached to them. Here
> is an initial take on some of the issues so far discussed, divided into
> categories.
>
> A. Loosen some restrictions on IDNA. The goal is to allow, *where
> feasible*, the same kind of expressive capability in other languages that
> is now provided for in English. It should be recognized that not all
> reasonable words of every language will qualify: even in English the lack of
> spaces and other punctuation forces compromises: words like "can't" are
> disallowed.
>
> Here is what I've heard so far:
>
> 1. Allow Unicode 5.0 characters
> 2. Provide for some mechanism for more quickly updating to
> successive Unicode versions.
> 3. Allow for combining marks at the end of bidi fields
> 4. Allow for ZWJ/ZWNJ in limited contexts (see a previous message).
>
> Except for #4, which probably most people haven't looked through yet, it
> appears that these are mostly uncontroversial.
>
> B. Tighten some restrictions on IDNA. The purpose of this appears to be to
> reduce the opportunity for spoofing. Thus any proposed restrictions should
> be assessed against that metric. That is: (a) does the restriction reduce
> spoofing significantly? (b) Are there no other reasonable mechanisms for
> doing so?
>
> Here is what I've heard so far:
>
> 1. Remove (or discourage) symbols and (most) punctuation.
> - This appears to be mostly uncontroversial. While the vast
> majority of symbols and punctuation do not cause spoofing problems (I♥NY.com
> is not a problem, for example), there is not enough value to having them to
> be worth the effort.
> 2. Remove (or discourage) non-spacing marks.
> - This is quite controversial. These marks are needed by many
> languages; excluding them is like removing vowels from English: "microsoft.com"
> becoming "mcrsft.cm".
> - A very good case has to be made that they (a) cause
> problems, and (b) those problems can't feasibly be handled with other
> mechanisms.
> 3. Remove (or discourage) archaic / technical characters (characters
> not in common modern use)
> - Unicode supplies a proposed list of such characters, in
> http://www.unicode.org/reports/tr39/#General_Security_Profile.
> However, it is recognized that any such list will need refinement and
> extension in the future.
> - Certain scripts are quite clearly archaic, and could be
> easily removed or discouraged.
> - Judging whether a character in a modern script is archaic,
> especially those in broad usage such as Latin, Arabic, and Cyrillic, can be
> quite difficult -- often these characters are pressed into use in minority
> languages.
>
> A major issue is the choice between removal and discouragement. Removal
> has the very significant cost of breaking backwards compatibility, so a
> clear case has to be made that there is no feasible alternative to handle
> spoofing problems that would otherwise occur.
>
> Mark
--
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20061127/d0c710ac/attachment.html
More information about the Idna-update
mailing list