IDNAbis Goals
Mark Davis
markdavis at google.com
Mon Nov 27 18:19:17 CET 2006
In order to assess the advantages and disadvantages of any approach, we need
to have a good idea of the goals and the weights attached to them. Here is
an initial take on some of the issues so far discussed, divided into
categories.
A. Loosen some restrictions on IDNA. The goal is to allow, *where feasible*,
the same kind of expressive capability in other languages that is now
provided for in English. It should be recognized that not all reasonable
words of every language will qualify: even in English the lack of spaces and
other punctuation forces compromises: words like "can't" are disallowed.
Here is what I've heard so far:
1. Allow Unicode 5.0 characters
2. Provide for some mechanism for more quickly updating to successive
Unicode versions.
3. Allow for combining marks at the end of bidi fields
4. Allow for ZWJ/ZWNJ in limited contexts (see a previous message).
Except for #4, which probably most people haven't looked through yet, it
appears that these are mostly uncontroversial.
B. Tighten some restrictions on IDNA. The purpose of this appears to be to
reduce the opportunity for spoofing. Thus any proposed restrictions should
be assessed against that metric. That is: (a) does the restriction reduce
spoofing significantly? (b) Are there no other reasonable mechanisms for
doing so?
Here is what I've heard so far:
1. Remove (or discourage) symbols and (most) punctuation.
- This appears to be mostly uncontroversial. While the vast
majority of symbols and punctuation do not cause spoofing
problems (I♥NY.com
is not a problem, for example), there is not enough value to
having them to
be worth the effort.
2. Remove (or discourage) non-spacing marks.
- This is quite controversial. These marks are needed by many
languages; excluding them is like removing vowels from English: "
microsoft.com" becoming "mcrsft.cm".
- A very good case has to be made that they (a) cause problems,
and (b) those problems can't feasibly be handled with other mechanisms.
3. Remove (or discourage) archaic / technical characters (characters
not in common modern use)
- Unicode supplies a proposed list of such characters, in
http://www.unicode.org/reports/tr39/#General_Security_Profile.
However, it is recognized that any such list will need refinement and
extension in the future.
- Certain scripts are quite clearly archaic, and could be easily
removed or discouraged.
- Judging whether a character in a modern script is archaic,
especially those in broad usage such as Latin, Arabic, and
Cyrillic, can be
quite difficult -- often these characters are pressed into use
in minority
languages.
A major issue is the choice between removal and discouragement. Removal has
the very significant cost of breaking backwards compatibility, so a clear
case has to be made that there is no feasible alternative to handle spoofing
problems that would otherwise occur.
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20061127/94a713bc/attachment-0001.html
More information about the Idna-update
mailing list