Tonus (was: Re: Casefolding Sigma (was: Re: IDNAbis Preprocessing Draft))

William Tan dready at gmail.com
Sat Feb 2 07:01:56 CET 2008


Hi Vaggelis,

> Zone file first: (requested example name βαγγέλης.gr -> registered as βαγγέλησ.gr equivalent to xn--ixahcfaz1a9d.gr)
>
> *IDNA2003: we have xn--ixahcfaz1a9d.gr IN NS....
>
> *IDNA200X: we still put xn--ixahcfaz1a9d.gr IN NS...?
>

According to my understanding of the current direction of IDNA200X
(I'm only lurking and following certain threads occasionally), here's
what could happen (be decided by this list):

1. "ς" and "σ" are no longer mapped; and
2a. Both "ς" and "σ" are allowed; or
2b. "σ" is still allowed, but "ς" can't be used.

In the case of 2a, it is suddenly legal to have xn--ixahcfaz1a1d
(prepending "xn--" to the straight punycode encoding of βαγγέλης) as
an A-label. IDNA2003 software will continue to produce
"xn--ixahcfaz1a9d.gr" for both versions. IDNA200X software will
produce different labels depending on which sigma is used. Problems
can be minimized if domain name registries take action (e.g. bundle
the domain names) before IDNA200X software becomes available.

In the case of 2b, "βαγγέλης" becomes invalid input to the IDNA200X
protocol. In order to avoid error messages popping up all over the
Internet, we need to convince software vendors who implement IDNA200X
to preprocess the input by mapping "ς" to "σ" before feeding it to the
IDNA200X algorithm. I'm not sure if the intention is to have a
document, as part of the IDNA200X standard, describe these
context-dependent or backward-compatibility mappings.


> Browser line:
>
> IDNA2003:
> *We start by typing xn--ixahcfaz1a9d.gr - it gets translated as βαγγέλησ.gr

The intention of IDNA2003 was to keep the ACE form hidden from the
user as much as possible. Even today, I don't think any sane user will
type xn--ixahcfaz1a9d.gr into the browser. The behavior you are seeing
is probably more prevalent in HTML anchor references, where the author
of the document has no way of knowing if the viewer has an
IDNA-capable browser. So, instead of using the IRI form:
"βαγγέλης.gr", the ACE is used so as to maximize compatibility. Since
IDNA2003 is lossy, the ToUnicode(ACE) version gets displayed.

In other words, this is just an artifact of the IDN deployment woe.
Applications should be able to "βαγγέλης.gr" without being exposed to
"xn--ixahcfaz1a9d.gr".

> *We start by typing βαγγέλης.gr (we are allowed to do that) - it gets translated as xn--ixahcfaz1a9d.gr
>
> IDNA200X:
> *We start by typing xn--ixahcfaz1a9d.gr - it gets translated as βαγγέλησ.gr
> *We start by typing βαγγέλης.gr. Are we allowed to do that? Is it corresponding to any PUNYCODE translation? Which domain name will be sent to the resolver?
>
> My main concern is the user experience. Can the user type in βαγγέλης.gr in the *browser* line or *email client* and still get xn--ixahcfaz1a9d.gr? Then it should be OK. Will the browser/email-client still allow Upper case characters in IDN as well?

See my comments to your "zone file" section.

-- 
http://xri.net/=wil


More information about the Idna-update mailing list