Erik van der Poel
erikv at google.com
Wed Dec 2 07:04:51 CET 2009
Dotless i (U+0131) is already supported by IDNA2003.
On Tue, Dec 1, 2009 at 2:49 PM, Kent Karlsson <kent.karlsson14 at comhem.se> wrote:
> I agree with Shawn. And I find all the talk about mapping as
> user interface issue disconcerting. The user interface is an
> inappropriate place to put the (now optional, and optional as
> to which) domain name mapping. The mapping must be done at a
> level closer to turning a given domain name into a punycoded
> domain name. And the mapping, which involves not only character
> by character mapping but also normalisation, must be the same
> for all (as Shawn says), as best we can get that.
> But I always found the decision for IDNA 2003 to use "case
> folding" instead of "language independent to-lower" mapping
> a mistake. I like Mark's suggested transition suggestion,
> in the expectation that UTR 46 will be changed to match,
> thus, eventually (5-10 years) allowing a handful of letters
> (including ž (dotless i) and ß (sharp s)) to be used unmapped
> in domain names (in the version consisting of U-labels).
> As a little extra datapoint, ß is used not only in German,
> but also in at least one related language, Kölsch, see
> http://en.wikipedia.org/wiki/K%C3%B6lsch_language. ß seems
> to be used like any other letter in Kölsch, in particular it
> can be doubled: see e.g.
> http://ksh.wikipedia.org/wiki/Auju%C3%9F%C3%9F, and
> I have no idea how widespread these spellings are. Still, this
> is yet another support for Michael's conjecture that ß is
> getting treated more and more as letter in it's own right.
> And ž and i are certainly not equivalent in any way in Turkish.
> In summary I support:
> 1) a certain handful of letters should be TRANSITIONAL,
> to become PVALID in a few years, per Mark's suggestion,
> 2) there should be just one mapping specification
> (modulo compatible updates for new versions of Unicode),
> and this mapping must be used by all implementations of
> IDNA 2008, and
> 3) the case-mapping part of the mapping specification
> should be "language independent to-lower", and NOT
> "case-fold, now with exceptions".
> /kent k
> Den 2009-12-01 21.45, skrev "Shawn Steele" <Shawn.Steele at microsoft.com>:
>> One example I discussed with Patrik yesterday, was whether locale
>> might affect mapping. I'd like to get better insight into the general
>> understanding of that.
>>> 1. Could locale determine whether a PVALID character should be mapped
>>> into another PVALID character prior to following the rules to turn
>>> into an ALABEL? I believe the consensus answer is probably SHOULD NOT
>>> or MUST NOT because that would make domains with that valid character
>>> unreachable by software using those locale rules.
>> I agree.
>>> 2. Could locale determine whether, or how, a DISALLOWED character is
>>> mapped into a PVALID character prior to getting an ALABEL?
>> No, for several reasons:
>> A) If I email you a link that contains a DISALLOWED character, your
>> machine/environment MUST map it to the same thing my machine did. Otherwise I
>> say "you have funny charges from travelling, visit Bank.org to correct it."
>> You are trying to pay for your flight home so you type "Bank.org" into the
>> computer in the kiosk in the foreign airport, and if it uses different mapping
>> rules you could end up as a phishing site. You don't want VISA.com to go to a
>> vžsa.com just because you're using a Turkish airport browser.
>> B) If I travel myself, I need consistent behavior regardless of the machine
>> I'm using.
>> C) If I see an international advertisement, the domains need to go to the same
>> server, regardless of who and how and where the person is typing in the link.
>> D) A server or relay wouldn't necessarily know the context the user expected
>> when interpreting a forwarded request.
>> E) It'd be a support nightmare.
>> F) I'm not sure if it is practical to create APIs that enable this
>> distinction. (We (software community, not just my company) already have
>> problems selecting the correct locale specific behavior for sorting and
>> formatting, etc., so we'd be bound to get it wrong at least some of the time.)
> Idna-update mailing
> Idna-update at alvestrand.no
> Idna-update mailing list
> Idna-update at alvestrand.no
More information about the Idna-update