Standardizing on IDNA 2003 in the URL Standard

John Cowan cowan at
Wed Aug 21 20:07:21 CEST 2013

Shawn Steele scripsit:

> A non-final sigma isn't (my understanding) a valid form of the word,

Alas, things are not so simple.  φιλος would be appropriate if the
semantic is 'friendship', but φιλοσ, with a non-final sigma, would
be appropriate as an abbreviation of φιλοσοφία 'philosophy'.
The Unicode rule is to downcase capital sigma to a non-final form if
a letter follows and to a final form otherwise, but this is just a
convention that dumb computers can follow rather than the whole truth.

> Eszett is less clear, because using eszett or ss influences the
> pronunciation (at least in Germany, in Switzerland that can be
> different).  I imagine it's rather worse if you're Turkish and prefer
> different i's.

Actually, missing diacritics aren't a big problem in Turkish for native
speakers, because of the vowel-harmony rules, which mean that most
words contain either the front vowels e, i, ö, and ü, or else the back
vowels a, ı (dotless i), o, and u, but not both in the same word.

> For German, nobody is ever going to expect fuß and
> to go different place.  And nobody's going to be surprised if
> fuß and end up at the same site.

Well, there are minimal pairs like Buße 'fine' vs. Busse 'buses', but
that's livable, particularly because in Switzerland and Liechtenstein
they are both spelled "Busse" anyway.

"But I am the real Strider, fortunately,"       John Cowan
he said, looking down at them with his face     cowan at
softened by a sudden smile.  "I am Aragorn son
of Arathorn, and if by life or death I can
save you, I will."  --LotR Book I Chapter 10

More information about the Idna-update mailing list