Tonus

Simon Josefsson simon at josefsson.org
Thu Jan 31 13:55:17 CET 2008


What isn't clear in this thread is that the _reason_ IDNA works the way
it does is because it chose to use Unicode NFKC for normalization.  That
isn't something that the Unicode specifications required IDNA to do.  I
recall discussions of which Unicode normalization form to use in the IPR
WG, and the eventual choice of NFKC was deliberate.  That may or may not
have been the right choice, but that's water under the bridge.  So if I
understand correctly, to fix this issue, we would need to replace NFKC
with something else in IDNAbis.

(Fwiw, for non-DNS purposes of string preparation, the choice of NFKC is
not so clearly the best choice.)

/Simon

Vint Cerf <vint at google.com> writes:

> Patrik is correct, Michael. The matching process IS destructive
> because of the Unicode normalization rules that are applied to allow
> for matching of the two kinds of sigma. If it were not for the fact
> that the two kinds of sigma are supposed to match in the domain name
> context, I suppose the difference could have been preserved.
>
> Patrik, in the LDH world, the upper and lower case forms are kept in
> the DNS database and are casefolded at matching time. In the IDN
> world, in part because of the complexity of the normalization
> process, is it correct that the design does a lot of the normalizing
> at registration, storing the normalized form in the database rather
> than the unnormalized form?
>
> vint
>
>
> On Jan 31, 2008, at 6:57 AM, Patrik Fältström wrote:
>
>> On 31 jan 2008, at 12.37, Michael Everson wrote:
>>
>>> Aren't we all in this together, Patrik?
>>
>> Of course we are!
>>
>>> Your answer here seems dismissive and aggressive. Whatever may be
>>> stored in the DNS, an implementation of IDN may well need to
>>> display Greek correctly and in a manner which meets reasonable
>>> user expectations. None of us may put our heads in the sand over
>>> this issue.
>>
>> And it is displayed correctly. IDNA2003 makes it perfectly possible
>> to have a URI to a webpage that have the final sigma as part of the
>> domain name, and with a preprocessing document that I thought we
>> would see one day (based on the last couple of days of discussion)
>> will make that same solution possible in the IDNA200x as well.
>>
>> What is NOT possible is to have something that contain final sigma,
>> is normalized and casefolded according to the Unicode Consortium
>> spec, keep the final sigma. This because the normalization
>> +casefolding mechanism that is needed because of the matching on
>> the server side is a destructive transformation.
>>
>>    Patrik
>>
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update


More information about the Idna-update mailing list