Tonus (was: Re: Casefolding Sigma (was: Re: IDNAbis Preprocessing Draft))

Vaggelis Segredakis segred at ics.forth.gr
Fri Feb 1 18:34:54 CET 2008


Patrik, Vint,

Thank you both for your attempt to clarify this issue.

Let me present you with some questions to help me clarify it further:

Zone file first: (requested example name βαγγέλης.gr -> registered as βαγγέλησ.gr equivalent to xn--ixahcfaz1a9d.gr)

*IDNA2003: we have xn--ixahcfaz1a9d.gr IN NS....

*IDNA200X: we still put xn--ixahcfaz1a9d.gr IN NS...?


Browser line:

IDNA2003:
*We start by typing xn--ixahcfaz1a9d.gr - it gets translated as βαγγέλησ.gr
*We start by typing βαγγέλης.gr (we are allowed to do that) - it gets translated as xn--ixahcfaz1a9d.gr

IDNA200X:
*We start by typing xn--ixahcfaz1a9d.gr - it gets translated as βαγγέλησ.gr
*We start by typing βαγγέλης.gr. Are we allowed to do that? Is it corresponding to any PUNYCODE translation? Which domain name will be sent to the resolver?

My main concern is the user experience. Can the user type in βαγγέλης.gr in the *browser* line or *email client* and still get xn--ixahcfaz1a9d.gr? Then it should be OK. Will the browser/email-client still allow Upper case characters in IDN as well?

Kind Regards,

Vaggelis

-----Original Message-----
From: Patrik Fältström [mailto:patrik at frobbit.se] 
Sent: Thursday, January 31, 2008 1:27 PM
To: Vaggelis Segredakis
Cc: 'Harald Alvestrand'; 'John C Klensin'; idna-update at alvestrand.no
Subject: Re: Tonus (was: Re: Casefolding Sigma (was: Re: IDNAbis Preprocessing Draft))

On 31 jan 2008, at 10.55, Vaggelis Segredakis wrote:

> Should I start spelling my name as Βαγγέλησ instead of  
> Βαγγέλης which is the
> correct spelling because some people had problems designing a  
> multilingual
> protocol for computers?

The Unicode Consortium has decided that at time of matching, there is  
an equivalence between the codepoints U+03C2 (GREEK SMALL LETTER FINAL  
SIGMA) and U+03C3 (GREEK SMALL LETTER SIGMA). This implies only one of  
these codepoints can be stored in the DNS. Further, the casefolding  
algorithm provided together with normalization state U+03C3 is the  
stable codepoint of the two, and because of that U+03C2 can not be  
stored in the core of a database for matchings.

The way IDNA2003 is designed, both U+03C2 and U+03C3 are mapped to U 
+03C3, which implies either of the two codepoints will match with U 
+03C3 that is stored in the DNS. And, because of this, you can  
according to IDNA2003 include U+03C2 in a domain name that you ask  
people to use (although U+03C3 is stored in the DNS).

IDNA200x is, for exactly the reasons this discussion exists --  
confusion, only talking about what can be stored in the DNS, and that  
is U+03C3 in both IDNA2003 and IDNA200x, all according to the design  
of the Unicode Character Set.

So, your issues have nothing to do with IDN and implementation of IDN,  
but design of the Unicode Character Set and I because of that ask you  
to direct your issues to the Unicode Consortium.

    Patrik

-----Original Message 2-----
From: Vint Cerf [mailto:vint at google.com] 
Sent: Thursday, January 31, 2008 2:38 PM
To: Vaggelis Segredakis
Cc: 'Harald Alvestrand'; 'John C Klensin'; patrik at frobbit.se; idna-update at alvestrand.no
Subject: Re: Tonus (was: Re: Casefolding Sigma (was: Re: IDNAbis Preprocessing Draft))

inputting them in the browser is fine. They get casefolded and  
normalized for dns lookup. your problem is that the IDN design, for  
valid reasons related to Unicode normalization, has not been able to  
preserve the original strings but only the normalized ones.

vint



More information about the Idna-update mailing list