IDNAbis compatibility

Cary Karp ck at
Wed Apr 4 21:35:29 CEST 2007

Quoting and responding to Ken:

> Bezillions of Turkish domains already exist -- and they already
> have established practices of folding "i"'s  

How can you know what is and what is not a Turkish domain name, or
that there are more of them than there are names in .com and .museum
combined, especially in light of your own comments on the uncertainty
with which such a determination can be made?:

> If, on the other hand, I decide that this is a Turkish domain
> name (by whatever means I don't know -- inferring from "tr",
> I suppose, since the label doesn't have a language tag), and
> needs a Turkish-language-specific casefolding, then I'm going
> to be looking for "tub{i-dotless}", which probably
> doesn't exist, and which *could* be different from what I
> am expecting, if the .tr domain registry wasn't careful about
> what got into it belonging to whom.  

Is tubıtak really the lower case form of TÜBİTAK?

Regardless of the answer to that question, all conceivable dottings of
the lower case string are supported by IDNA2003 and could easily be
bundled in the registry. At risk of being the target of a UTC snicker,
what would be broken if IDNAbis were to permit the xn-- prefixing of
what presumably would be the Punycode output TBTAK-2pa30d, thus making
it available for inclusion in the registered bundle?

> This is already established practice for Turkish domain names,
> as best as I can tell, given the constraints that they have
> had to work already for years in an ASCII-only context.  

If everybody who has been living with the constraints of ASCII-only
domain naming can reasonably be expected to continue doing so, why are
we bothering with IDN at all?

And although it may be a refreshing change to approach this via a
consideration of the idiosyncrasies of Turkish orthography as projected
into the IDN space, it may be worth a reminder about the more
frequently cited problem with the final-form lower-case sigma, which
currently precludes the correct local representation of as central an
entity in the development of IDN as the name of a country (Κύπρος).

I also note that the post-reform Duden volume, "Die deutsche
Rechtschreibung", makes explicit reference to the use of the Esszet as
an upper case character in names ("In Dokumenten kann bei Namen aus
Gründen der Eindeutigkeit auch bei Großbuchstaben das ß verwendet
werden -- HEINZ GROßE). If nothing else, this erodes the justification
for the current non-availability of IDNs such as smartaß.de (even
though I have a pretty good idea of what the response to this
observation is going to be).


More information about the Idna-update mailing list