AW: AW: sharp s (Eszett)

Felix Sasaki fsasaki at w3.org
Wed Mar 12 00:36:50 CET 2008


John C Klensin wrote:
> --On Tuesday, 11 March, 2008 12:12 -0700 Mark Davis
> <mark.davis at icu-project.org> wrote:
>
>   
>> It is not just a matter of "typographic convenience": the
>> recognized standard German uppercase of "ß" *is* "SS".
>> Unicode did not invent this relationship -- it is just
>> following recognized German standards. In German orthography
>> ß is not just an ordinary letter like any other. If in normal
>> use ß were caseless, or if ß had a unique uppercase, we
>> wouldn't be having this discussion. But it is not normal. And
>> the previous behavior in IDNA2003 can't be simply discarded.
>> There are two main issues:
>>
>> *1. IDNA compatibility. *Right now, all of the following point
>> to the same website. If we make this exception for ß, then
>> they won't.
>>
>> http://FASS.de
>> http://Faß.de
>> http://fass.de
>>
>> This is not just a UI issue, since the URLs above can be in
>> all sorts of data (email, webpages, etc). And even if IDNA200x
>> comes out soon, data and programs exist, that will only slowly
>> be updated. So for an extended, perhaps indefinite, amount of
>> time browsers and search engines (like ours at Google) will
>> need to handle both IDNA2003 and IDNA200x URLs. When the
>> results under each system point to different places, that is a
>> significant problem and possible security issue.
>>
>> *2. Case insensitivity. *If we make this exception, then
>> uppercasing a domain name causes it to go to a different
>> place. Even if there were no compatibility issue, there is
>> still the issue of whether it is more important to have ß or
>> to have case-insensitivity.
>>
>>
>> While it would be possible to have an exception for ß, both
>> of these issues need to be considered very carefully, and we
>> should not make any decision lightly. Any proposal for an
>> exception for ß really should get consensus from a broad set
>> of stakeholders, including DENIC, NIC.AT, and SWITCH, as well
>> as the standards bodies DIN, ÖN, and SNV.
>>     
>
> Mark,
>
> I think I understand all of these issues.   I tried to write my
> note very carefully, but obviously it was not careful enough.
> The bottom line, regardless of what terminology and
> classifications we use, there is a belief, backed up by various
> authorities, language (orthography) reform documents, etc., that
> Eszett is a real character that should not be mapped, converted,
> of folded into something else because doing so leads to loss of
> information.
>
> While it is clear that it would be easy to say "we have to stick
> with Unicode casefolding rules" or "compatibility with IDNA2003
> is more important than anything else", I'm not comfortable
> telling the users that we've decided that they don't get to use
> this character because it is inconvenient.  

The problem of this process is to find out who represents "the user". Is 
Georg representing the German language user community, or is the input 
of this community to Unicode a better representation, or ...?

About "we've decided that they don't get to use this character because 
it is inconvenient. ":  Georg was saying "In Germany it is mandatory to 
maintain the small sharp s in uppercase names in official documents like 
IDs, passports or tax declarations etc.". In Japan it is mandatory to 
use Kanji characters for such documents which are not part of the 
Unicode repertoire, or can only be regarded as variants of Unicode code 
points. That does not prevent the use of Unicode code points in other 
areas - e.g. domain names. To put it differently: there are many 
characters that are not  convenient for IDNs, and which hence are 
disallowed. If we have the requirement "IDNs need to be able to 
represent IDs of users", we open a big can of worms.

Felix (a German speaker living in Japan)



More information about the Idna-update mailing list