Turkish dotless i (Re: AW: AW: sharp s (Eszett))
Harald Tveit Alvestrand
harald at alvestrand.no
Tue Mar 18 23:41:17 CET 2008
Martin Duerst skrev:
> At 19:10 08/03/17, Harald Tveit Alvestrand wrote:
>> Martin Duerst skrev:
>>> The next example where to test this approach would be the issue
>>> of the (Turkish,...) dotless i. My guess is that things would
>>> work out fine (i.e. the concept of information loss would show
>>> the desirability for having both dot-ful and dot-less 'i').
>> We all know the set by heart by now - the sharp S, the capital letter I with dot above, the Greek small letter final sigma.
>> Now, we CAN'T make uppercase I (without a dot) fold to lowercase dotless I, since that would break ASCII compatibility, so my guess for the dotless-i case is that we can't make that one work. YMMV.
> My understanding was that the Turkish,... I/i was one of the main cases
> that led to the separation of input mappings such as case foldings from
> the protocol itself, to allow these input mappings to depend on locale.
> I.e. a Turkish,... system would map 'I' to lowercase i without dot,
> whereas on other systems, it would map simply to 'i'. If we can't get
> that to work somehow, then I don't see the point of locale-dependency.
hmm... in draft-faltstrom-idnabis-tables-05, dotless i (0131) is PVALID.
In RFC 3454, it is not listed in appendix B.2 or B.3, so it seems that
it did not partake in any case mappings.
So it may be backwards compatible to just leave it as PVALID, and let
the applications figure out, independently, and possibly depending on
context, what to do with an uppercase I (dotless) in user input. Unlike
the case with eszett, no conformant idna2003 application is going to
hurt a lowercase dotless i.
More information about the Idna-update