Dot-mapping

Fri Dec 7 03:36:28 CET 2007

--On Friday, 07 December, 2007 06:31 +0900 fujiwara at jprs.co.jp
wrote:

> Dear IDNAbis authors,
> 
> I found that RFC 3490 section 3.1 the first requirement is
> removed in the new protocol document
> draft-klensin-idnabis-protocol-02.
> 
> |  1) Whenever dots are used as label separators, the following
> |     characters MUST be recognized as dots: U+002E (full
> stop), U+3002 |     (ideographic full stop), U+FF0E (fullwidth
> full stop), U+FF61 |     (halfwidth ideographic full stop). 
> 
> And John described this reason.
> 
> Is removing the dot-mapping already decided?

_Nothing_ is already firmly decided.  The purpose of the
documents is to provide a very concrete proposal, one about
which you can (and should) make comments and raise objections.
We hope the documents can focus discussions and help us move
quickly forward together, but I am sure that there are tradeoffs
that we have made incorrectly.   I can only hope they are minor
ones.

> The dot-mapping has useful in some language enviromnet.

Yes, we know that.

> The dot-mapping is already implemented in many applications.
> Removing it causes many problems.

Removing it _from those applications_ would be a bad idea, IMO.

> I'm afraid that another languages may have the same problem
> and the characters which need to be treated as a dot may
> increase.

Yes. And the risk of more dot-characters being added is one of
the reasons for removing dot-mapping from the protocol.  

Let me try again to explain:

In your applications, both legacy and new, you should certainly
map the dots that make sense to you to map.   For your case,
that means you should almost certainly map Japanese-related
dots, but should not make an attempt to map any character
(worldwide and in any script) that looks to you like a dot.   If
you start mapping anything that looks like a dot to you or your
users, you might end up, e.g., treating the numeral 5 as a dot.

But the real question is not whether or not dots should be
mapped but which ones and where that should be specified.   If
the protocol specifies the mapping, then it has to have a list
of things that are considered dots (as IDNA2003 does).  But
dot-like characters might be added later, as you point out,
which means that all of IDNA becomes dependent of one version of
Unicode which better not change (unless a character property of
"dot" were created).  

And these dots create a parsing problem because, for example, in
IDNA2003, if one had a string containing one or more Japanese
middle dots and at least one A-label, it is an IDN.  If the same
string doesn't contain any A-labels, it is a single label.  And,
the fundamental assumption of IDNA --that DNS resolvers and
applications that don't know anything about IDNs can pass the
domain names back and forth, and work normally-- is violated
because DNS resolvers that are conformant to RFC 1034/1035
(only) can't parse an FQDN into labels.

Independent of the "what is a dot" issue, I believe that parsing
problem identifies a fundamental error in IDNA2003 that would
need to be fixed, somehow, even if we abandoned the revision
project.

But, again, nothing prevents you from displaying the dots in
domain names --especially domain names containing Japanese
characters-- in a Japanese-friendly way, accepting
Japanese-friendly dots on keyboards and mapping them to
ASCII/DNS dots.  And, in my opinion, you should continue doing
that.

Is that helpful?

    john
b

>> Subject: IDNAbis discussion style, mappings, and
>> (incidentally) Eszett From: John C Klensin <klensin at jck.com>
>> To: idna-update at alvestrand.no
>> Date: Thu, 29 Nov 2007 18:14:12 -0500
> ---snip---
>> So the draft IDNA200X documents take the dot-mapping provision
>> out, turning the parsing of all domain names, including those
>> that contain A-labels, back over to the rules of RFC 1034 and
>> 1035 and the acceptance of special dots into a UI issue. To
>> me, the arguments for that choice are overwhelming.  But it
>> is a tradeoff against user-predictable behavior with scripts
>> that use non-ASCII dots and compatibility with existing
>> non-protocol text that represents IDNs using those dots: if
>> applications that map between such text and the IDNA protocol
>> don't do the right UI things with dots other than U+002E, bad
>> things will happen.  And, if we work the tradeoffs so that
>> types of compatibility issues overwhelm the reasons why
>> special dot mapping was a bad idea, then we are stuck with
>> the special dots forever.