fujiwara at jprs.co.jp
fujiwara at jprs.co.jp
Tue Dec 11 08:56:37 CET 2007
> From: John C Klensin <klensin at jck.com>
> > The dot-mapping is already implemented in many applications.
> > Removing it causes many problems.
> Removing it _from those applications_ would be a bad idea, IMO.
If the dot-mapping is removed from the standards, the implementors
will remove it.
> > I'm afraid that another languages may have the same problem
> > and the characters which need to be treated as a dot may
> > increase.
> Yes. And the risk of more dot-characters being added is one of
> the reasons for removing dot-mapping from the protocol.
> Let me try again to explain:
> In your applications, both legacy and new, you should certainly
> map the dots that make sense to you to map. For your case,
> that means you should almost certainly map Japanese-related
> dots, but should not make an attempt to map any character
> (worldwide and in any script) that looks to you like a dot. If
> you start mapping anything that looks like a dot to you or your
> users, you might end up, e.g., treating the numeral 5 as a dot.
Representative implementations (firefox, IE, Safari) are
internationalized. Language specific implementation is rare.
If the dot-mapping is allowed in some languages, a common definition
is necessary and it is the section 3.1 of RFC 3490.
And more, the candidate dot-like characters are already listed
in Unicode 5.0 standard. ( grep "FULL STOP" UnicodeData.txt )
They all are marked as "NEVER" in draft-faltstrom-idnabis-tables-03.txt.
There is no collision/conflict.
002E; # FULL STOP
0589; # ARMENIAN FULL STOP
06D4; # ARABIC FULL STOP
0701; # SYRIAC SUPRALINEAR FULL STOP
0702; # SYRIAC SUBLINEAR FULL STOP
1362; # ETHIOPIC FULL STOP
166E; # CANADIAN SYLLABICS FULL STOP
1803; # MONGOLIAN FULL STOP
1809; # MONGOLIAN MANCHU FULL STOP
2CF9; # COPTIC OLD NUBIAN FULL STOP
2CFE; # COPTIC FULL STOP
3002; # IDEOGRAPHIC FULL STOP
FE12; # PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC FULL STOP
FE52; # SMALL FULL STOP
FF0E; # FULLWIDTH FULL STOP
FF61; # HALFWIDTH IDEOGRAPHIC FULL STOP
Newly added dot-like characters MUST be marked as "NEVER".
There may need new IANA registry for dot-like characters.
Or treating all "FULL STOP" characters as "dot" is one solution.
> But the real question is not whether or not dots should be
> mapped but which ones and where that should be specified. If
> the protocol specifies the mapping, then it has to have a list
> of things that are considered dots (as IDNA2003 does). But
> dot-like characters might be added later, as you point out,
> which means that all of IDNA becomes dependent of one version of
> Unicode which better not change (unless a character property of
> "dot" were created).
It is true.
> And these dots create a parsing problem because, for example, in
> IDNA2003, if one had a string containing one or more Japanese
> middle dots and at least one A-label, it is an IDN. If the same
> string doesn't contain any A-labels, it is a single label. And,
> the fundamental assumption of IDNA --that DNS resolvers and
> applications that don't know anything about IDNs can pass the
> domain names back and forth, and work normally-- is violated
> because DNS resolvers that are conformant to RFC 1034/1035
> (only) can't parse an FQDN into labels.
"Japanese middle dot" is not a "FULL STOP" character.
I think you intend A-labels separated by "IDEOGRAPHIC FULL STOP".
A-labels separated by "IDEOGRAPHIC FULL STOP" is a valid IDN. But it
contains non-ASCII characters, it is not a valid traditional (ASCII)
domainname. It is not a problem that resolvers which does not know
IDNA cannot parse it, I think.
> Independent of the "what is a dot" issue, I believe that parsing
> problem identifies a fundamental error in IDNA2003 that would
> need to be fixed, somehow, even if we abandoned the revision
> But, again, nothing prevents you from displaying the dots in
> domain names --especially domain names containing Japanese
> characters-- in a Japanese-friendly way, accepting
> Japanese-friendly dots on keyboards and mapping them to
> ASCII/DNS dots. And, in my opinion, you should continue doing
Again, many implementors develop implementations which support many
languages. Applications cannot predict which language the user intend
to enter. There needs a standard way.
Kazunori Fujiwara, JPRS
More information about the Idna-update