kent.karlsson14 at comhem.se
Tue Dec 1 23:49:29 CET 2009
I agree with Shawn. And I find all the talk about mapping as
user interface issue disconcerting. The user interface is an
inappropriate place to put the (now optional, and optional as
to which) domain name mapping. The mapping must be done at a
level closer to turning a given domain name into a punycoded
domain name. And the mapping, which involves not only character
by character mapping but also normalisation, must be the same
for all (as Shawn says), as best we can get that.
But I always found the decision for IDNA 2003 to use "case
folding" instead of "language independent to-lower" mapping
a mistake. I like Mark's suggested transition suggestion,
in the expectation that UTR 46 will be changed to match,
thus, eventually (5-10 years) allowing a handful of letters
(including (dotless i) and ß (sharp s)) to be used unmapped
in domain names (in the version consisting of U-labels).
As a little extra datapoint, ß is used not only in German,
but also in at least one related language, Kölsch, see
http://en.wikipedia.org/wiki/K%C3%B6lsch_language. ß seems
to be used like any other letter in Kölsch, in particular it
can be doubled: see e.g.
I have no idea how widespread these spellings are. Still, this
is yet another support for Michael's conjecture that ß is
getting treated more and more as letter in it's own right.
And and i are certainly not equivalent in any way in Turkish.
In summary I support:
1) a certain handful of letters should be TRANSITIONAL,
to become PVALID in a few years, per Mark's suggestion,
2) there should be just one mapping specification
(modulo compatible updates for new versions of Unicode),
and this mapping must be used by all implementations of
IDNA 2008, and
3) the case-mapping part of the mapping specification
should be "language independent to-lower", and NOT
"case-fold, now with exceptions".
Den 2009-12-01 21.45, skrev "Shawn Steele" <Shawn.Steele at microsoft.com>:
> One example I discussed with Patrik yesterday, was whether locale
> might affect mapping. I'd like to get better insight into the general
> understanding of that.
>> 1. Could locale determine whether a PVALID character should be mapped
>> into another PVALID character prior to following the rules to turn
>> into an ALABEL? I believe the consensus answer is probably SHOULD NOT
>> or MUST NOT because that would make domains with that valid character
>> unreachable by software using those locale rules.
> I agree.
>> 2. Could locale determine whether, or how, a DISALLOWED character is
>> mapped into a PVALID character prior to getting an ALABEL?
> No, for several reasons:
> A) If I email you a link that contains a DISALLOWED character, your
> machine/environment MUST map it to the same thing my machine did. Otherwise I
> say "you have funny charges from travelling, visit Bank.org to correct it."
> You are trying to pay for your flight home so you type "Bank.org" into the
> computer in the kiosk in the foreign airport, and if it uses different mapping
> rules you could end up as a phishing site. You don't want VISA.com to go to a
> vsa.com just because you're using a Turkish airport browser.
> B) If I travel myself, I need consistent behavior regardless of the machine
> I'm using.
> C) If I see an international advertisement, the domains need to go to the same
> server, regardless of who and how and where the person is typing in the link.
> D) A server or relay wouldn't necessarily know the context the user expected
> when interpreting a forwarded request.
> E) It'd be a support nightmare.
> F) I'm not sure if it is practical to create APIs that enable this
> distinction. (We (software community, not just my company) already have
> problems selecting the correct locale specific behavior for sorting and
> formatting, etc., so we'd be bound to get it wrong at least some of the time.)
Idna-update at alvestrand.no
More information about the Idna-update