Letter to Unicode Technical Committee on IDNA2008

Vint Cerf vint at google.com
Sat Nov 28 22:57:17 CET 2009


Note to the IDNABIS Working Group:
During processing of the Last Call responses, the Area Director  
expressed discomfort with the state of consensus on the use of Latin  
Small Letter Sharp S and  Greek Small Letter Final Sigma. To gain  
additional input on whether these should remain PVALID, I am sending  
the letter below to the Unicode Technical Committee for its opinion.

vint cerf

------------------


Ms. Lisa Moore
Chairman, Unicode Technical Committee
via email: lisam at us.ibm.com

CC:
Eric Muller
Vice Chairman, Unicode Technical Committee
via email: emuller at adobe.com

Mark Davis
President, Unicode Consortium
via email: markdavis at googlle.com

28 November 2010

Dear Ms. Moore:

I am writing to you in my role as chairman of the IDNABIS working  
group, addressing this request to you as president of the Unicode  
Consortium. As you know, treatment of the two characters, Greek Small  
Letter Final Sigma (U+03C2) and Latin Small Letter Sharp S (U+00DF)  
have been the source of considerable discussion during the IDNABIS  
Working Group effort on specifying the  IDNA2008 proposed replacement  
of the IDNA2003 standard for the use of Unicode in Internationalized  
Domain Names. Latin Capital Letter Sharp S (U+1E9E) was added in  
Unicode version 5.1.0 but recommended rules for its use were provided  
as shown below:

Begin quote from Unicode Version 5.1.0
Tailored Casing Operations

The Unicode Standard provides default casing operations. There are  
circumstances in which the default operations need to be tailored for  
specific locales or environments. Some of these tailorings have data  
that is in the standard, in the SpecialCasing.txt file, notable for  
the Turkish dotted capital I and dotless small i. In other cases, more  
specialized tailored casing operations may be appropriate. These  
include:

Titlecasing of IJ at the start of words in Dutch
Removal of accents when uppercasing letters in Greek
Uppercasing U+00DF ( ß ) LATIN SMALL LETTER SHARP S to the new U+1E9E  
LATIN CAPITAL LETTER SHARP S
However, these tailorings may or may not be desired, depending on the  
implementation in question.

In particular, capital sharp s is intended for typographical  
representations of signage and uppercase titles, and other  
environments where users require the sharp s to be preserved in  
uppercase. Overall, such usage is rare. In contrast, standard German  
orthography uses the string "SS" as uppercase mapping for small sharp  
s. Thus, with the default Unicode casing operations, capital sharp s  
will lowercase to small sharp s, but not the reverse: small sharp s  
uppercases to "SS". In those instances where the reverse casing  
operation is needed, a tailored operation would be required.

End quote from Unicode Version 5.1.0

In IDNA2003, Sharp S was mapped to "ss" by means of a casing operation  
that mapped lower case Sharp S to uppercase "SS" and then down to  
lowercase "ss". Registrations and lookups using the IDNA2003 rules  
applied this mechanism.

During the discussions in the IDNABIS Working Group on IDNA2008, a  
strong consensus developed around not mapping for example for  
registration purposes and also for preserving the property that the  
IDNA2008-defined A-Label and U-Label forms be fully symmetric (i.e.,  
convertible into one another without change or loss).

During these same discussions, a consensus seemed to develop to permit  
(ie. make "PVALID" in IDNA2008 parlance) Latin Small Letter Sharp S (U 
+00DF) and Greek Small Letter Final Sigma (U+03C2). The recommended  
casing actions of Unicode (i.e. toCaseFold) on Sharp S and Final Sigma  
produce "ss" in the case of Sharp S and Greek Small Letter Sigma (U 
+03C3) in the case of Final Sigma.

To make the lowercase forms PVALID using the functional rules of  
IDNA2008, exceptions were required to overcome the recommended casing  
mechanics of Unicode (i.e. application of CaseFolding).

Note that IDNA2008 explicitly permits mapping for User Interface  
purposes:
a) draft-ietf-idnabis-protocol-17#section-5.2
c) draft-ietf-idnabis-rationale-14#section-4.4
d) draft-ietf-idnabis-rationale-14#section-6
e) draft-ietf-idnabis-rationale-14#section-7.3
f) draft-ietf-idnabis-mappings-05

If Small Letter Sharp S and Small Letter Final Sigma were to be made  
DISALLOWED, these mapping provisions would permit these characters to  
be handled as a User Interface matter prior to lookup.

Because the practices of IDNA2003 are in conflict with the proposed  
practices of IDNA2008, and because the Last Call discussions have  
surfaced controversy over the incorporation of the two lowercase forms  
in question, I request an organizational recommendation from UTC as to  
the treatment of these characters. Taking into account the prohibition  
of mapping on registration, which I take to be firm, and the  
requirement that A-Label and U-Label forms must be unambiguously  
convertible into each other, would the UTC recommend to exclude the  
use of Small Letter Sharp  S and Small Letter Final Sigma in IDNA2008  
by removing their exceptions and making each DISALLOWED?

A prompt response would be much appreciated considering we have  
delayed reporting the results of the IETF LAST CALL to the Internet  
Engineering Steering Group while this matter is debated.

Sincerely,

Vinton Cerf
Chairman, IDNABIS Working Group of the Internet Engineering Task Force
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20091128/a1c26a03/attachment.htm 


More information about the Idna-update mailing list