IDNAbis Goals

Vint Cerf vint at google.com
Mon Nov 27 19:05:38 CET 2006


mark,
 
taking this from the other direction, one might start with a pretty limited
set(s) of characters (but far more than present use of LDH) that are
believed to be "safe" and then try to find ways to expand the set(s) within
the tolerance of safety risk. Plainly there will be differences of opinion
as to what is "safe enough" - the expressiveness of the characters
permitted in IDNs should not, in my opinion, be required to have the same
degree of expressiveness as one would expect in natural written languages.
These are, after all, computer-based identifiers, technically speaking.
Plainly we want them to have some linguistic value in the sense that they
are memorable, but the presence of search, cut/paste, and directories
suggests that perfect memorability is less critical than, say, global
interoperability. 
 
I hope no one reads this and thinks I am deliberately short-changing the
expressiveness side of the equation but I am deeply concerned that we
appreciate the intended utility of IDNs compared to general multilingual
discourse.
 
vint
 
 
 
Vinton G Cerf
Chief Internet Evangelist
Google
Regus Suite 384
13800 Coppermine Road
Herndon, VA 20171
 
+1 703 234-1823
+1 703-234-5822 (f)
 
vint at google.com
www.google.com <http://www.google.com/> 
 
 

  _____  

From: idna-update-bounces at alvestrand.no [mailto:idna-update-
bounces at alvestrand.no] On Behalf Of Mark Davis
Sent: Monday, November 27, 2006 12:19 PM
To: idna-update at alvestrand.no
Subject: IDNAbis Goals


In order to assess the advantages and disadvantages of any approach, we
need to have a good idea of the goals and the weights attached to them.
Here is an initial take on some of the issues so far discussed, divided
into categories. 

A. Loosen some restrictions on IDNA. The goal is to allow, *where
feasible*, the same kind of expressive capability in other languages that
is now provided for in English. It should be recognized that not all
reasonable words of every language will qualify: even in English the lack
of spaces and other punctuation forces compromises: words like "can't" are
disallowed. 

Here is what I've heard so far:


1.	Allow Unicode 5.0 characters 

2.	Provide for some mechanism for more quickly updating to successive
Unicode versions.


3.	Allow for combining marks at the end of bidi fields 

4.	Allow for ZWJ/ZWNJ in limited contexts (see a previous message).


Except for #4, which probably most people haven't looked through yet, it
appears that these are mostly uncontroversial.

B. Tighten some restrictions on IDNA. The purpose of this appears to be to
reduce the opportunity for spoofing. Thus any proposed restrictions should
be assessed against that metric. That is: (a) does the restriction reduce
spoofing significantly? (b) Are there no other reasonable mechanisms for
doing so? 

Here is what I've heard so far:


1.	Remove (or discourage) symbols and (most) punctuation. 


*	This appears to be mostly uncontroversial. While the vast majority
of symbols and punctuation do not cause spoofing problems (I♥NY.com is not
a problem, for example), there is not enough value to having them to be
worth the effort. 

2.	Remove (or discourage) non-spacing marks. 


*	This is quite controversial. These marks are needed by many
languages; excluding them is like removing vowels from English: " microsoft.
com <http://microsoft.com> " becoming "mcrsft.cm". 

*	A very good case has to be made that they (a) cause problems, and
(b) those problems can't feasibly be handled with other mechanisms. 

3.	Remove (or discourage) archaic / technical characters (characters
not in common modern use)



*	Unicode supplies a proposed list of such characters, in
http://www.unicode.org/reports/tr39/#General_Security_Profile. However, it
is recognized that any such list will need refinement and extension in the
future.


*	Certain scripts are quite clearly archaic, and could be easily
removed or discouraged. 

*	Judging whether a character in a modern script is archaic,
especially those in broad usage such as Latin, Arabic, and Cyrillic, can be
quite difficult -- often these characters are pressed into use in minority
languages. 


A major issue is the choice between removal and discouragement. Removal has
the very significant cost of breaking backwards compatibility, so a clear
case has to be made that there is no feasible alternative to handle
spoofing problems that would otherwise occur.

Mark 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20061127/f90effa1/attachment.html


More information about the Idna-update mailing list