consensus call tranche 6 results (character conversion)

Sun Nov 9 18:48:55 CET 2008

this call got only a few responses - mostly positive but with some  
observations in general that will need to be resolved in Minneapolis.

YES: 4
NO: 1

however, see comments below of a more general nature.

CONSENSUS STATEMENT:

(6) Conversion, validation, and related issues.

(6.a) The discussion of Unicode conversions in Section 5.2 of
Protocol-05 is satisfactory.   (P.7)

(6.b) The discussion of A-label validation in Section 5.4 of
Protocol is satisfactory, even though it leaves considerable
flexibility to implementation decisions.  (P.8)

(6.c) Labels are not permitted to start with combining marks.
(P.13)

Comments:

1. Section 4.2 of Protocol could be interpreted as only requiring  
Unicode Strings to be in NFC *if* they resulted from conversion from  
a legacy character encoding, rather than requiring it also of Unicode  
strings that did not result from such an encoding. The text needs to  
be fixed so that it is very clear that the NFC requirement is also  
true of strings that did not require conversion, as is the intent. I  
don't think this part is controversial -- it just makes it clearer  
and more consistent with 5.5.

2. In terms of validation (the subject of this tranche), the second  
paragraph of 4.2 and the section 5.3 open up an unpleasant  
interoperability and security hole, since it places no limits on the  
mappings that can be applied to forbidden characters.

Take the following 3 strings:
HTTP://SCHAFFER.DE
HTTP://SCHÄFFER.DE
HTTP://SCHÆFFER.DE
This section allows any implementation to map *any* of these to *any*  
of the following:
http://schaeffer.de
http://schäffer.de
http://schaffer.de
That is, one implementation could map #2 to #3, while another  
implementation could map #2 to #1. Or, for that matter, many other  
variants. It allows YENİKAPI.TR to be mapped to any of yenikapı.tr,  
yenıkapi.tr, or other dotted vs dotless i variants. As a matter of  
fact, a conformant implementation could map РУССКИЙ.RU to  
sarapalin.ru, since no limits are placed on the kinds of mappings  
that can be done.

============
Strictly speaking I am fine with the 3 questions asked (6.a, 6.b and  
6.c). I share some of the issues mentioned by Mark, but could live  
with the current text. So, if this is strictly the question, this  
would be a YES.

If the consensus call also concerns 5.3 "Character Changes in  
Preprocessing or the User Interface", then I have a more serious  
issue that would require a fix imo (thus a NO position). At this  
point clause 5.3 contains a 'MUST NOT' declaration in its 3rd  
paragraph while saying in its 4th paragraph "This step is not  
standardized as part of IDNA, and is not further specified here". I  
see this as contradictory, because I don't see how you can impose a  
requirement on a step that you are explicitly not 'specifying'. In  
other words, the 'MUST NOT' statement must be removed and replaced by  
a statement saying that the pre-processing should not (lower case  
intended) map PROTOCOL-VALID characters. Again it is up to the pre- 
processing spec to mandate that, not the protocol spec.

NOTE NEW BUSINESS ADDRESS AND PHONE
Vint Cerf
Google
1818 Library Street, Suite 400
Reston, VA 20190
202-370-5637
vint at google.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20081109/a681bacf/attachment.htm