Letter to Unicode Technical Committee on IDNA2008
Vint Cerf
vint at google.com
Sat Nov 28 22:57:17 CET 2009
Note to the IDNABIS Working Group:
During processing of the Last Call responses, the Area Director
expressed discomfort with the state of consensus on the use of Latin
Small Letter Sharp S and Greek Small Letter Final Sigma. To gain
additional input on whether these should remain PVALID, I am sending
the letter below to the Unicode Technical Committee for its opinion.
vint cerf
------------------
Ms. Lisa Moore
Chairman, Unicode Technical Committee
via email: lisam at us.ibm.com
CC:
Eric Muller
Vice Chairman, Unicode Technical Committee
via email: emuller at adobe.com
Mark Davis
President, Unicode Consortium
via email: markdavis at googlle.com
28 November 2010
Dear Ms. Moore:
I am writing to you in my role as chairman of the IDNABIS working
group, addressing this request to you as president of the Unicode
Consortium. As you know, treatment of the two characters, Greek Small
Letter Final Sigma (U+03C2) and Latin Small Letter Sharp S (U+00DF)
have been the source of considerable discussion during the IDNABIS
Working Group effort on specifying the IDNA2008 proposed replacement
of the IDNA2003 standard for the use of Unicode in Internationalized
Domain Names. Latin Capital Letter Sharp S (U+1E9E) was added in
Unicode version 5.1.0 but recommended rules for its use were provided
as shown below:
Begin quote from Unicode Version 5.1.0
Tailored Casing Operations
The Unicode Standard provides default casing operations. There are
circumstances in which the default operations need to be tailored for
specific locales or environments. Some of these tailorings have data
that is in the standard, in the SpecialCasing.txt file, notable for
the Turkish dotted capital I and dotless small i. In other cases, more
specialized tailored casing operations may be appropriate. These
include:
Titlecasing of IJ at the start of words in Dutch
Removal of accents when uppercasing letters in Greek
Uppercasing U+00DF ( ß ) LATIN SMALL LETTER SHARP S to the new U+1E9E
LATIN CAPITAL LETTER SHARP S
However, these tailorings may or may not be desired, depending on the
implementation in question.
In particular, capital sharp s is intended for typographical
representations of signage and uppercase titles, and other
environments where users require the sharp s to be preserved in
uppercase. Overall, such usage is rare. In contrast, standard German
orthography uses the string "SS" as uppercase mapping for small sharp
s. Thus, with the default Unicode casing operations, capital sharp s
will lowercase to small sharp s, but not the reverse: small sharp s
uppercases to "SS". In those instances where the reverse casing
operation is needed, a tailored operation would be required.
End quote from Unicode Version 5.1.0
In IDNA2003, Sharp S was mapped to "ss" by means of a casing operation
that mapped lower case Sharp S to uppercase "SS" and then down to
lowercase "ss". Registrations and lookups using the IDNA2003 rules
applied this mechanism.
During the discussions in the IDNABIS Working Group on IDNA2008, a
strong consensus developed around not mapping for example for
registration purposes and also for preserving the property that the
IDNA2008-defined A-Label and U-Label forms be fully symmetric (i.e.,
convertible into one another without change or loss).
During these same discussions, a consensus seemed to develop to permit
(ie. make "PVALID" in IDNA2008 parlance) Latin Small Letter Sharp S (U
+00DF) and Greek Small Letter Final Sigma (U+03C2). The recommended
casing actions of Unicode (i.e. toCaseFold) on Sharp S and Final Sigma
produce "ss" in the case of Sharp S and Greek Small Letter Sigma (U
+03C3) in the case of Final Sigma.
To make the lowercase forms PVALID using the functional rules of
IDNA2008, exceptions were required to overcome the recommended casing
mechanics of Unicode (i.e. application of CaseFolding).
Note that IDNA2008 explicitly permits mapping for User Interface
purposes:
a) draft-ietf-idnabis-protocol-17#section-5.2
c) draft-ietf-idnabis-rationale-14#section-4.4
d) draft-ietf-idnabis-rationale-14#section-6
e) draft-ietf-idnabis-rationale-14#section-7.3
f) draft-ietf-idnabis-mappings-05
If Small Letter Sharp S and Small Letter Final Sigma were to be made
DISALLOWED, these mapping provisions would permit these characters to
be handled as a User Interface matter prior to lookup.
Because the practices of IDNA2003 are in conflict with the proposed
practices of IDNA2008, and because the Last Call discussions have
surfaced controversy over the incorporation of the two lowercase forms
in question, I request an organizational recommendation from UTC as to
the treatment of these characters. Taking into account the prohibition
of mapping on registration, which I take to be firm, and the
requirement that A-Label and U-Label forms must be unambiguously
convertible into each other, would the UTC recommend to exclude the
use of Small Letter Sharp S and Small Letter Final Sigma in IDNA2008
by removing their exceptions and making each DISALLOWED?
A prompt response would be much appreciated considering we have
delayed reporting the results of the IETF LAST CALL to the Internet
Engineering Steering Group while this matter is debated.
Sincerely,
Vinton Cerf
Chairman, IDNABIS Working Group of the Internet Engineering Task Force
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20091128/a1c26a03/attachment.htm
More information about the Idna-update
mailing list