Re: Turkish i/ı (Re: Eszett)

John C Klensin klensin at jck.com
Sun Jul 12 18:19:49 CEST 2009



--On Sunday, July 12, 2009 19:04 +0900 "\"Martin J. Dürst\""
<duerst at it.aoyama.ac.jp> wrote:

>...
> I guess Turkish schoolchildren know the difference between i
> and ı. I guess we can't make I map to both i and ı at the
> same time, but my  guess is that Turks would be glad to be
> able to use both i and ı even if  I mapped to i rather than
> to ı (as they would feel would be the right  thing).

Martin,

If I correct understand your guesses, I agree.  While we haven't
heard from anyone with a claim to speak authoritatively for the
Turkish language community, I think there is a basic principle
here that we should violate only if there are much more
important considerations.  If the community that uses particular
characters  considers them different, we should not make
decisions that globally turns one into the other.  Either they
are treated as distinct or one is DISALLOWED (with the
possibility of reversing _that_ decision later if we discover
that it was painfully and obviously wrong).  If we keep them
separate, then a local registry can make provisions to unify
them or treat them as variants if that makes sense to them.  If
we unify them, that is a global decision that eliminates all
possibility of local registry or regulatory choice.

I think there are better ways to think about this, but that
principle is basically one of keeping options open for the
future when we cannot be certain that our decisions are right
for all time.

One of those more important considerations is that, because of
historical DNS decisions, we can't do much about ASCII
case-matching.  We can treat upper-case undecorated Latin
characters as DISALLOWED if they appear in non-ASCII labels, but
we can't have them as PVALID characters separate from their
lower-case equivalents.

That principle is made more important because, for better or
worse, we don't have the ability to apply a matching function
without actually substituting one character for another
("mapping") nor, independent of what registries might do, any
capability of applying locality-dependent functions... both of
which The Unicode Standard anticipates for cases like this.

>From my point of view, the arguments for Eszett, for a distinct
dotless-i, for avoiding toCaseFold, and for being very
conservative about application of NFKC in situations in which it
is possible to have reasonable debates about whether the
unmapped and mapped forms are really the same character (not
just "equivalent" under some rule that may not work as well for
IDNs as it does for running text) are all applications of that
principle.

    john






More information about the Idna-update mailing list