Casefolding Sigma (was: Re: IDNAbis Preprocessing Draft)

Vaggelis Segredakis segred at ics.forth.gr
Mon Jan 28 16:12:35 CET 2008


Dear Patrik,

I would like to comment on the emails of this list on the issue of the
casefolding sigma.

In the initial protocol for .IDN domain names some of your colleagues chose
to implement solutions that are against the normal use of the Greek language
in the domain name identifiers. In Greek a word in lower case letters has
the sign "tonos" on the punctuated letter of the word (e.g. δοκιμή).
However, when this letter is written on capital letters we do not use tonos
anymore (δοκιμή becomes ΔΟΚΙΜΗ). Let's put this example in the xn-- form of
a domain name: you get "xn--jxalpdlp.gr" for the domain name "δοκιμή.gr" but
you get "xn--pxagfdlp.gr" for the domain name "ΔΟΚΙΜΗ.gr".

As a consequence of this fact, in order to implement the Greek language in
the domain name space we had to use the solution of bundles and in many
cases DNAME. You are welcome to check my presentation "IDNs in Greece"
(http://www.icann.org/meetings/lisbon/presentation-idns-greece-27mar07.pdf)
for our solution.

Now let's come to this newly brought issue of the final sigma. Before this
discussion we knew that the final sigma was bundled with the small letter
sigma. If you tried a Greek IDN with the final sigma on each and every
position you had a small letter sigma, it was equivalent to the same domain
with a small sigma in every place. If you started from the xn-- form on a
browser, you would never get a final sigma but instead in all the positions
you would have the small sigma. Not the best solution but it was an
acceptable one since it allows the use of the final sigma as it is used in
the Greek language and still does not create any phishing problem if you use
it instead of the normal small sigma - they are interchangeable.

Some people on this list propose this should change. Can you please clarify
your proposal on this issue and be as kind as to explain to us Greeks why
the previous solution creates problems to your protocol?

I thought that the IDNs would be implemented to solve the language barrier
problems in the use of internet. Instead I find out that the correct use of
a language is not a priority if this happens to create exceptions in the
protocol you are trying to propose. I am afraid that we are facing a problem
there as science should make life easier to the people, instead of requiring
people to adopt their language to protocols, which has been the case for
many non-latin-alphabet people.

If there seems to be something that needs some straighten up, it is this
tonos hyphenation problem which is very serious for us and not this final
sigma issue. I would welcome your proposals on this serious issue.


Kind Regards,

 

Vaggelis Segredakis
Administrator of the .GR Top Level Domain
Institute of Computer Science
Foundation for Research and Technology - Hellas
Tel. +30-281-0391450
Fax +30-281-0391451
Email segred at ics.forth.gr

----------------------------------------------------------------------
Today's Topics:

   1. Re: Casefolding Sigma (was: Re: IDNAbis Preprocessing Draft)
      (Patrik F?ltstr?m)


----------------------------------------------------------------------

Message: 1
Date: Mon, 28 Jan 2008 08:52:32 +0100
From: Patrik F?ltstr?m <patrik at frobbit.se>
Subject: Re: Casefolding Sigma (was: Re: IDNAbis Preprocessing Draft)
To: idna-update at alvestrand.no
Message-ID: <C2E6E72F-0370-43D6-86F0-12914B9C05C8 at frobbit.se>
Content-Type: text/plain; charset="us-ascii"

The tables document explain what codepoints can be in a U-label. After
reading what all of you have written, I see three different suggestions:

(1) Keep final sigma as it is today, NEVER, as casefold(final sigma) ! 
= final sigma
(2) Have final sigma as an exception, CONTEXT
(3) Have final sigma as an exception, MAYBE NO

I have read the email on this list, and my proposal for conclusion of
consensus is the following:

Given that some people (and the Unicode Standard) say final sigma in some
context might be mapped to sigma (casefolding, context dependent
etc) it would be pretty bad if someone actually register a domain name with
final sigma. This because people that use clients that "happen"  
to (based on context or whatever else) map this to sigma will not get a
match when looking up the domain name.

Because of this, and the fact I really want to minimize the amount of
exceptions, I find the conclusion is that final sigma should stay as a
non-exception, i.e. alternative (1) above, which imply it will be in NEVER
and because of that not allowed to be registered in DNS.

That said, any preprocessing, user interface etc, can of course allow final
sigma and map it to something that is appropriate according to whatever
application, context, locale or such. Rules that are impossible to implement
in the global DNS.

Next version of the tables document will because of that NOT say anything
special about final sigma.

     Patrik

P.S. I have though found some more bugs in my script(s) that generate the
non-normative tables in the tables document. I have because of that now
falled back from using my own code to use the Unicode libraries in perl. If
people know about any problems with that, let me know.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part Url :
http://www.alvestrand.no/pipermail/idna-update/attachments/20080128/0c73f418
/PGP-0001.bin

------------------------------

_______________________________________________
Idna-update mailing list
Idna-update at alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update


End of Idna-update Digest, Vol 13, Issue 42
*******************************************



More information about the Idna-update mailing list