consensus call tranche 6 results (character conversion)
Vint Cerf
vint at google.com
Sun Nov 9 18:48:55 CET 2008
this call got only a few responses - mostly positive but with some
observations in general that will need to be resolved in Minneapolis.
YES: 4
NO: 1
however, see comments below of a more general nature.
CONSENSUS STATEMENT:
(6) Conversion, validation, and related issues.
(6.a) The discussion of Unicode conversions in Section 5.2 of
Protocol-05 is satisfactory. (P.7)
(6.b) The discussion of A-label validation in Section 5.4 of
Protocol is satisfactory, even though it leaves considerable
flexibility to implementation decisions. (P.8)
(6.c) Labels are not permitted to start with combining marks.
(P.13)
Comments:
1. Section 4.2 of Protocol could be interpreted as only requiring
Unicode Strings to be in NFC *if* they resulted from conversion from
a legacy character encoding, rather than requiring it also of Unicode
strings that did not result from such an encoding. The text needs to
be fixed so that it is very clear that the NFC requirement is also
true of strings that did not require conversion, as is the intent. I
don't think this part is controversial -- it just makes it clearer
and more consistent with 5.5.
2. In terms of validation (the subject of this tranche), the second
paragraph of 4.2 and the section 5.3 open up an unpleasant
interoperability and security hole, since it places no limits on the
mappings that can be applied to forbidden characters.
Take the following 3 strings:
HTTP://SCHAFFER.DE
HTTP://SCHÄFFER.DE
HTTP://SCHÆFFER.DE
This section allows any implementation to map *any* of these to *any*
of the following:
http://schaeffer.de
http://schäffer.de
http://schaffer.de
That is, one implementation could map #2 to #3, while another
implementation could map #2 to #1. Or, for that matter, many other
variants. It allows YENİKAPI.TR to be mapped to any of yenikapı.tr,
yenıkapi.tr, or other dotted vs dotless i variants. As a matter of
fact, a conformant implementation could map РУССКИЙ.RU to
sarapalin.ru, since no limits are placed on the kinds of mappings
that can be done.
============
Strictly speaking I am fine with the 3 questions asked (6.a, 6.b and
6.c). I share some of the issues mentioned by Mark, but could live
with the current text. So, if this is strictly the question, this
would be a YES.
If the consensus call also concerns 5.3 "Character Changes in
Preprocessing or the User Interface", then I have a more serious
issue that would require a fix imo (thus a NO position). At this
point clause 5.3 contains a 'MUST NOT' declaration in its 3rd
paragraph while saying in its 4th paragraph "This step is not
standardized as part of IDNA, and is not further specified here". I
see this as contradictory, because I don't see how you can impose a
requirement on a step that you are explicitly not 'specifying'. In
other words, the 'MUST NOT' statement must be removed and replaced by
a statement saying that the pre-processing should not (lower case
intended) map PROTOCOL-VALID characters. Again it is up to the pre-
processing spec to mandate that, not the protocol spec.
NOTE NEW BUSINESS ADDRESS AND PHONE
Vint Cerf
Google
1818 Library Street, Suite 400
Reston, VA 20190
202-370-5637
vint at google.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20081109/a681bacf/attachment.htm
More information about the Idna-update
mailing list