Dot-mapping

Tue Dec 11 08:56:37 CET 2007

> From: John C Klensin <klensin at jck.com>
> > The dot-mapping is already implemented in many applications.
> > Removing it causes many problems.
> 
> Removing it _from those applications_ would be a bad idea, IMO.

If the dot-mapping is removed from the standards, the implementors
will remove it.

> > I'm afraid that another languages may have the same problem
> > and the characters which need to be treated as a dot may
> > increase.
> 
> Yes. And the risk of more dot-characters being added is one of
> the reasons for removing dot-mapping from the protocol.  
> 
> Let me try again to explain:
> 
> In your applications, both legacy and new, you should certainly
> map the dots that make sense to you to map.   For your case,
> that means you should almost certainly map Japanese-related
> dots, but should not make an attempt to map any character
> (worldwide and in any script) that looks to you like a dot.   If
> you start mapping anything that looks like a dot to you or your
> users, you might end up, e.g., treating the numeral 5 as a dot.

Representative implementations (firefox, IE, Safari) are
internationalized. Language specific implementation is rare.

If the dot-mapping is allowed in some languages, a common definition
is necessary and it is the section 3.1 of RFC 3490.

And more, the candidate dot-like characters are already listed
in Unicode 5.0 standard.  ( grep "FULL STOP" UnicodeData.txt )

They all are marked as "NEVER" in draft-faltstrom-idnabis-tables-03.txt.
There is no collision/conflict.

002E; # FULL STOP
0589; # ARMENIAN FULL STOP
06D4; # ARABIC FULL STOP
0701; # SYRIAC SUPRALINEAR FULL STOP
0702; # SYRIAC SUBLINEAR FULL STOP
1362; # ETHIOPIC FULL STOP
166E; # CANADIAN SYLLABICS FULL STOP
1803; # MONGOLIAN FULL STOP
1809; # MONGOLIAN MANCHU FULL STOP
2CF9; # COPTIC OLD NUBIAN FULL STOP
2CFE; # COPTIC FULL STOP
3002; # IDEOGRAPHIC FULL STOP
FE12; # PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC FULL STOP
FE52; # SMALL FULL STOP
FF0E; # FULLWIDTH FULL STOP
FF61; # HALFWIDTH IDEOGRAPHIC FULL STOP

Newly added dot-like characters MUST be marked as "NEVER".
There may need new IANA registry for dot-like characters.

Or treating all "FULL STOP" characters as "dot" is one solution.

> But the real question is not whether or not dots should be
> mapped but which ones and where that should be specified.   If
> the protocol specifies the mapping, then it has to have a list
> of things that are considered dots (as IDNA2003 does).  But
> dot-like characters might be added later, as you point out,
> which means that all of IDNA becomes dependent of one version of
> Unicode which better not change (unless a character property of
> "dot" were created).  

It is true.

> And these dots create a parsing problem because, for example, in
> IDNA2003, if one had a string containing one or more Japanese
> middle dots and at least one A-label, it is an IDN.  If the same
> string doesn't contain any A-labels, it is a single label.  And,
> the fundamental assumption of IDNA --that DNS resolvers and
> applications that don't know anything about IDNs can pass the
> domain names back and forth, and work normally-- is violated
> because DNS resolvers that are conformant to RFC 1034/1035
> (only) can't parse an FQDN into labels.

"Japanese middle dot" is not a "FULL STOP" character.
I think you intend A-labels separated by "IDEOGRAPHIC FULL STOP".

A-labels separated by "IDEOGRAPHIC FULL STOP" is a valid IDN.  But it
contains non-ASCII characters, it is not a valid traditional (ASCII)
domainname. It is not a problem that resolvers which does not know
IDNA cannot parse it, I think.

> Independent of the "what is a dot" issue, I believe that parsing
> problem identifies a fundamental error in IDNA2003 that would
> need to be fixed, somehow, even if we abandoned the revision
> project.
> 
> But, again, nothing prevents you from displaying the dots in
> domain names --especially domain names containing Japanese
> characters-- in a Japanese-friendly way, accepting
> Japanese-friendly dots on keyboards and mapping them to
> ASCII/DNS dots.  And, in my opinion, you should continue doing
> that.

Again, many implementors develop implementations which support many
languages. Applications cannot predict which language the user intend
to enter. There needs a standard way.

Regards,

--
Kazunori Fujiwara, JPRS