Label separators (was: Re: Urdu and SPACE, FULL STOP (Re: comments on IDNAbis: draft-faltstrom-idnabis-tables-04.txt Arabic))

John C Klensin klensin at jck.com
Sat Feb 23 18:15:11 CET 2008


Dr. Hussain (and others),

I've been distracted by other work for a few days, but want to
address the FULL STOP problem, which, as Harald pointed out, is
associated with a label separator issue and not an issue with
"tables" at all.

The problem we face here is that the single most critical
consideration in looking at IDNA is that the DNS, and DNS
applications that are not IDNA-aware, must continue to work well
and predictably when confronted with IDN labels in either native
Unicode character or ACE form.

Personally, I frequently wish that constraint did not exist
because one can imagine many interesting things that could be
done without it.  But the price of eliminating the constraint is
modifications to the DNS that would take us considerable effort
and probably many years to deploy.  No one wants to wait that
long so we are stuck with the constraint.

For label separators, the constraint has even stronger
implications than it does for matching rules (I've discussed the
latter in another note) because applications and systems that
are otherwise unaware of the DNS itself (not just unaware of
IDNA) have to be able to parse full domain names into labels in
order to map back and forth between the "labels separated by
full stops" format that we usually see and the DNS internal
format (a list of labels with explicit length information).
Even the language of IDNA2003 about mapping of period-like
characters isn't sufficient to prevent those characters from
showing up in contexts in which they would interfere with domain
name parsing.  However the intent is clear, and that intent is
to be sure that, by the time a domain name makes it into a file
or out on the Internet, the things that look like full stops
must be translated into ASCII periods and the latter substituted.

Oddly, this is where the "no mapping in the protocol" principle
of the IDNA200X proposals become very helpful.  The IDNA2003
version says, in essence, "these characters (and no others) are
considered appropriate alternative forms of label separators,
but you have to map them to ASCII period when you see them".
The IDNA200X version is equivalent to "the only valid label
separator on the wire or in interchange is ASCII period.
However, since we have prohibited all other punctuation
characters (other than hyphen) from ever actually appearing in a
domain name, if you need to use a convention locally to permit
easier typing of that character, you can substitute any
convenient punctuation (or other disallowed) character for it...
as long as it is mapped to ASCII period before you store it in a
file or transmit it on the wire".

That is clearly not a perfect solution, but it gives you the
flexibility you need while preserving both global
interoperability and the ability for non-IDNA applications to
unambiguously parse domain names into labels.

    john



More information about the Idna-update mailing list