Final Sigma (was: RE: Esszett, Final Sigma, ZWJ and ZWNJ)

Vint Cerf vint at google.com
Thu Feb 26 05:32:43 CET 2009


Before we go down the path of introducing a collection of prefixes, I  
think we have a lot to get done with the xn-- version first.

vint


Vint Cerf
Google
1818 Library Street, Suite 400
Reston, VA 20190
202-370-5637
vint at google.com




On Feb 25, 2009, at 12:48 PM, Erik van der Poel wrote:

> Hello Vaggelis,
>
> Thank you for staying in touch with this WG.
>
> Are you saying that you are seeing fewer registrations or actual use
> of xn-- names under .gr?
>
> One idea might be to reserve xn-- for the "global" mappings that are
> based on Unicode specs. Then you and other Greek script users could
> come up with a spec for the Greek script and/or language, and the
> prefix to use in that case could be gr-- or if that is politically
> sensitive, just take the "next" prefix after xn-- which would be xo--
> since 'o' comes after 'n'.
>
> Under this scenario, non-ASCII host names in HTML hrefs would be
> transformed using the global mappings, and then converted to Punycode,
> and xn-- would be prepended.
>
> The only way to get xo-- names into HTML would be to use the xo-- form
> (not non-ASCII, not ꯍ).
>
> Then there could be an extra field in the DNS that indicates how to
> display those names in Unicode form. I.e. it would tell you which
> sigmas are supposed to be final, which characters should have a tonos,
> and so on.
>
> Erik
>
> On Wed, Feb 25, 2009 at 3:02 AM, Vaggelis Segredakis
> <segred at ics.forth.gr> wrote:
>> Dear Mark and Tina,
>>
>>
>>
>> The original IDNA2003 mapping has made life easier for us on the  
>> final sigma
>> -> sigma issue but the example Mark presented brings forth another  
>> very big
>> problem we have faced with that version: In Greek you never put a
>> hyphenation mark in a word consisting only by capital letters. The  
>> correct
>> uppercase for χρήσης.gr (xn--jxas2ajbt.gr) is ΧΡΗΣΗΣ.gr  
>> (xn--sxaa2ajbt.gr)
>> and not ΧΡΉΣΗΣ.gr which was accepted by IDNA2003 as the only  
>> equivalent.
>>
>>
>>
>> We started there and then to use bundling options to bundle DNS  
>> tags to make
>> them work as our language is normally used where it should have  
>> been the
>> other way round. IDNA tags should be able to represent languages as  
>> they are
>> used. It happens in Latin character languages.
>>
>>
>>
>> I would welcome a solution that takes this second issue into  
>> account as well
>> and further simplifies life for Greek users who get a poor  
>> experience of the
>> IDNs. We had already a meeting with our Telecommunications  
>> regulator, our
>> Government and the .CY registry and we tried to raise a common  
>> position on
>> this new solution of the final sigma representation as a separate  
>> character.
>> The results of this meeting are pending but from my understanding a  
>> more
>> global solution on these issues that haunt the Greek IDNs would be  
>> more
>> welcome than patches on a problematic protocol.
>>
>>
>>
>> My belief is that if a broader solution would be welcomed by this  
>> working
>> group, our LIC would be interested to participate in a broad public
>> discussion for a consensus in how we wish our IDNs to operate. The  
>> question
>> is if this WG is ready to bend some rules and change some former  
>> decisions
>> because it looks that xn-- might be a thing of the past soon.
>>
>>
>>
>> Vaggelis
>>
>>
>>
>> ________________________________
>>
>> From: mark.edward.davis at gmail.com  
>> [mailto:mark.edward.davis at gmail.com] On
>> Behalf Of Mark Davis
>> Sent: Wednesday, February 25, 2009 1:18 AM
>> To: Tina Dam
>> Cc: Vaggelis Segredakis; idna-update at alvestrand.no; Vint Cerf;  
>> Sotiris
>> Panaretou; Panagiotis Papaspiliopoulos; Euripides Zervanos
>> Subject: Re: Final Sigma (was: RE: Esszett, Final Sigma, ZWJ and  
>> ZWNJ)
>>
>>
>>
>> The original IDNA2003 mapping was chosen for a purpose: it allows  
>> χρήσης.gr
>> and ΧΡΉΣΗΣ.gr to both go to the same page, without requiring  
>> bundling. (Note
>> the two different kinds of lowercase sigmas.)
>>
>> I still think a better approach would be to retain the mapping for
>> compatibility, but specify that when converting back from punycode,  
>> trailing
>> sigmas be transformed into final sigmas. For example, in the  
>> address bar you
>> could type ΧΡΉΣΗΣ.gr, and when you went to the page you'd see  
>> χρήσης.gr in
>> the address bar.
>>
>> The only downside I can see is that it would encourage Greek domain  
>> names to
>> use interior hyphens where necessary to get the sigma right. So you  
>> would
>> want to register
>>
>> ευρείας-χρήσης.gr
>>  instead of
>> ευρείασχρήσης.gr
>>
>> But that's not a big downside compared with the alternatives.
>>
>> Mark
>>
>> On Tue, Feb 24, 2009 at 14:34, Tina Dam <tina.dam at icann.org> wrote:
>>
>> Vaggelis,
>>
>> I totally understand the frustration and concern that you are  
>> expressing. I
>> am wondering though if it is not better to get this corrected now,  
>> so that
>> the Greek script/language is functioning correctly in the Internet/ 
>> with
>> domain names, than it is to have this half solution that really  
>> makes things
>> worse the larger the volume of domain names that are registered?  
>> That is
>> both under .GR, but also other TLDs that might introduce the Greek
>> characters (.CY is the most natural existing TLD that comes to mind  
>> in
>> addition to .GR, but off course also gTLDs, and even more  
>> importantly as we
>> move to the IDN TLDs).
>>
>>
>>
>> As far as I see things this is not a matter of mapping or no  
>> mappings, but
>> in the case about the final sigma it is the matter of a wrong  
>> decision being
>> made in 2003, making
>>
>>
>>
>> U+03A3 GREEK CAPITAL LETTER SIGMA - always map into:
>>
>>
>>
>> U+03C3 GREEK SMALL LETTER SIGMA - when in fact (as you and your  
>> colleagues
>> are well aware of and as you express below) it often should be  
>> mapped into:
>>
>>
>>
>> U+03C2 GREEK SMALL LETTER FINAL SIGMA
>>
>>
>>
>> In other words, the mapping of the Capital Sigma is not a one-to- 
>> one nor a
>> global solution like for example the mapping of Capital "A" to  
>> lower-case
>> "a" is, and hence this sigma-mapping should never have been  
>> introduced in
>> the protocol in the first place.
>>
>>
>>
>> About solutions....I am wondering if you are going to be at the  
>> Mexico meeting
>> this following week and if so, perhaps we can find a good time to  
>> chat
>> further about it? (That would be with my IDN hat on and ICANN hat  
>> of, since
>> ICANN off course has nothing to do with your policies).
>>
>>
>>
>> Tina
>>
>>
>>
>>
>>
>>
>>
>> From: idna-update-bounces at alvestrand.no
>> [mailto:idna-update-bounces at alvestrand.no] On Behalf Of Vaggelis  
>> Segredakis
>> Sent: Tuesday, February 24, 2009 2:41 AM
>> To: idna-update at alvestrand.no; 'Vint Cerf'
>> Cc: 'Euripides Zervanos'; 'Panagiotis Papaspiliopoulos'; 'Sotiris  
>> Panaretou'
>> Subject: Re: Esszett, Final Sigma, ZWJ and ZWNJ
>>
>>
>>
>> Dear Vint,
>>
>>
>>
>> I would love to say that we as the .gr Registry are enthusiastic  
>> about the
>> proposed solution (PVALID Final Sigma) but in reality we are quite
>> skeptical. I can clearly see the advantages of the use of a  
>> distinct final
>> sigma. The reality however is that the change is significant and the
>> registry will have to take measures to reduce the impact.
>>
>>
>>
>> It will be necessary for us (and I believe anyone who uses Esszett  
>> as well)
>> to "map" the two versions of the domain names ourselves to overcome  
>> the fact
>> that browsers and software do not change overnight and IDNA2003 and  
>> IDNA2008
>> are incompatible.
>>
>>
>>
>> In Greek, a word that finishes with a final sigma in small  
>> characters when
>> typed in capital letters gets a normal capital sigma in the place  
>> of that
>> final sigma. Although you have prohibited Capital letters in  
>> IDNA2008 any
>> browser programmer will try to translate letter by letter a URL  
>> typed in
>> capital. Most possibly then he will translate a capital Sigma to  
>> sigma and
>> not final sigma, regardless of its position in the word. Why would a
>> programmer try to learn Greek grammar?
>>
>>
>>
>> For each final sigma in a domain name, the registrant will have to  
>> register
>> a variant with a lower sigma in that position as well and each  
>> variant that
>> occurs if you put more than one final sigma in a domain name. For 2  
>> final
>> sigmas you will have 4 variants. If you add to this the tonos  
>> punctuation
>> point issue (in capital letters it is not used and this gives us two
>> variants for each domain name), you end up with sixteen variants  
>> for a
>> single domain name with two final sigmas (two words)!
>>
>>
>>
>> We already do bundling of the domain names. We will probably do it  
>> in the
>> future, especially if this proposed solution moves forward. If you  
>> have any
>> other alternatives though that could shed some new light on these  
>> issues,
>> this might be a good time to start discussing them. Even if this  
>> means a
>> best practice document or IDNAv2_2009, anything should be open to
>> discussion.
>>
>>
>>
>> Best Regards,
>>
>>
>>
>> Vaggelis Segredakis
>>
>> Administrator of the .GR Top Level Domain
>>
>> Institute of Computer Science
>>
>> Foundation for Research and Technology - Hellas
>>
>> Tel. +30-281-0391450
>>
>> Fax +30-281-0391451
>>
>> Email segred at ics.forth.gr
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Message: 3
>>
>> Date: Mon, 23 Feb 2009 20:14:04 -0500
>>
>> From: Vint Cerf <vint at google.com>
>>
>> Subject: Re: Esszett, Final Sigma, ZWJ and ZWNJ
>>
>> To: Mark Davis <mark at macchiato.com>
>>
>> Cc: Paul Hoffman <phoffman at imc.org>, Andrew Sullivan
>>
>>            <ajs at shinkuro.com>,    idna-update at alvestrand.no, John C  
>> Klensin
>>
>>            <klensin at jck.com>
>>
>> Message-ID: <2C4BC1C5-3B45-46FA-AA6D-9A60D3C72B35 at google.com>
>>
>> Content-Type: text/plain; charset="utf-8"
>>
>>
>>
>> Mark,
>>
>>
>>
>> thanks - I think what left me in an ambiguous state was the term  
>> "bits on
>> the wire".  In your example, under the IDNA2003 mapping process,  
>> the final
>> sigma is mapped into ordinary sigma and THEN the resulting string  
>> is looked
>> up (after conversion to xn-- format using the punycode algorithm).  
>> The two
>> forms become identical prior to lookup.
>>
>> Under the proposed IDNA2008 rules, the two strings remain distinct  
>> in both
>> the U-label and A-label format and thus look "different" on the  
>> wire and
>> unless other measures are taken (bundling, restricted registration,  
>> etc) it
>> is possible for the two domains to yield distinct results on lookup.
>>
>>
>>
>> Paul - is that the picture you wanted to paint?
>>
>>
>>
>> sorry to be slow to see which bits you were comparing.
>>
>>
>>
>> v
>>
>>
>>
>>
>>
>> Vint Cerf
>>
>> Google
>>
>> 1818 Library Street, Suite 400
>>
>> Reston, VA 20190
>>
>> 202-370-5637
>>
>> vint at google.com
>>
>>
>>
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>
>>
>>
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>
>>



More information about the Idna-update mailing list