Standards and localization (was Dot-mapping)

Yangwoo Ko newcat at icu.ac.kr
Fri Dec 7 21:33:52 CET 2007


By removing dot-mappings, handling of dot-like characters is now left to 
  developers' discretion. Developers are encouraged to apply as much 
local context as possible when encountered dot-like characters. In 
several local environments, which I am familiar with, decision on which 
are (and are not) dots is quite evident. Thus, as a user, I don't care 
too much about this. But, ...

Still, I have a least two concerns.

(1) It is not clear what is the right thing if a dot-like character is 
encounterd in a situation where local context is vague. I don't have a 
concreate exmaple of this situation, but it does not imply that it does 
not exist.

(2) Even in a very clear local context, there can exist multiple (and 
hence incompatible) practices. (I think we have many examples for this.) 
In such a situation, it may take quite long time to converge on a 
consensus and user experiences are not that good.

If these two concers are real (though I hope not so), removed 
dot-mappings might be resurrected in a separate guideline document.

regards

John C Klensin wrote:
> 
> 
> --On Friday, 07 December, 2007 06:31 +0900 fujiwara at jprs.co.jp
> wrote:
> 
>> Dear IDNAbis authors,
>>
>> I found that RFC 3490 section 3.1 the first requirement is
>> removed in the new protocol document
>> draft-klensin-idnabis-protocol-02.
>>
>> |  1) Whenever dots are used as label separators, the following
>> |     characters MUST be recognized as dots: U+002E (full
>> stop), U+3002 |     (ideographic full stop), U+FF0E (fullwidth
>> full stop), U+FF61 |     (halfwidth ideographic full stop). 
>>
>> And John described this reason.
>>
>> Is removing the dot-mapping already decided?
> 
> _Nothing_ is already firmly decided.  The purpose of the
> documents is to provide a very concrete proposal, one about
> which you can (and should) make comments and raise objections.
> We hope the documents can focus discussions and help us move
> quickly forward together, but I am sure that there are tradeoffs
> that we have made incorrectly.   I can only hope they are minor
> ones.
> 
>> The dot-mapping has useful in some language enviromnet.
> 
> Yes, we know that.
> 
>> The dot-mapping is already implemented in many applications.
>> Removing it causes many problems.
> 
> Removing it _from those applications_ would be a bad idea, IMO.
> 
>> I'm afraid that another languages may have the same problem
>> and the characters which need to be treated as a dot may
>> increase.
> 
> Yes. And the risk of more dot-characters being added is one of
> the reasons for removing dot-mapping from the protocol.  
> 
> Let me try again to explain:
> 
> In your applications, both legacy and new, you should certainly
> map the dots that make sense to you to map.   For your case,
> that means you should almost certainly map Japanese-related
> dots, but should not make an attempt to map any character
> (worldwide and in any script) that looks to you like a dot.   If
> you start mapping anything that looks like a dot to you or your
> users, you might end up, e.g., treating the numeral 5 as a dot.
> 
> But the real question is not whether or not dots should be
> mapped but which ones and where that should be specified.   If
> the protocol specifies the mapping, then it has to have a list
> of things that are considered dots (as IDNA2003 does).  But
> dot-like characters might be added later, as you point out,
> which means that all of IDNA becomes dependent of one version of
> Unicode which better not change (unless a character property of
> "dot" were created).  
> 
> And these dots create a parsing problem because, for example, in
> IDNA2003, if one had a string containing one or more Japanese
> middle dots and at least one A-label, it is an IDN.  If the same
> string doesn't contain any A-labels, it is a single label.  And,
> the fundamental assumption of IDNA --that DNS resolvers and
> applications that don't know anything about IDNs can pass the
> domain names back and forth, and work normally-- is violated
> because DNS resolvers that are conformant to RFC 1034/1035
> (only) can't parse an FQDN into labels.
> 
> Independent of the "what is a dot" issue, I believe that parsing
> problem identifies a fundamental error in IDNA2003 that would
> need to be fixed, somehow, even if we abandoned the revision
> project.
> 
> But, again, nothing prevents you from displaying the dots in
> domain names --especially domain names containing Japanese
> characters-- in a Japanese-friendly way, accepting
> Japanese-friendly dots on keyboards and mapping them to
> ASCII/DNS dots.  And, in my opinion, you should continue doing
> that.
> 
> Is that helpful?
> 
>     john
> b
> 
>>> Subject: IDNAbis discussion style, mappings, and
>>> (incidentally) Eszett From: John C Klensin <klensin at jck.com>
>>> To: idna-update at alvestrand.no
>>> Date: Thu, 29 Nov 2007 18:14:12 -0500
>> ---snip---
>>> So the draft IDNA200X documents take the dot-mapping provision
>>> out, turning the parsing of all domain names, including those
>>> that contain A-labels, back over to the rules of RFC 1034 and
>>> 1035 and the acceptance of special dots into a UI issue. To
>>> me, the arguments for that choice are overwhelming.  But it
>>> is a tradeoff against user-predictable behavior with scripts
>>> that use non-ASCII dots and compatibility with existing
>>> non-protocol text that represents IDNs using those dots: if
>>> applications that map between such text and the IDNA protocol
>>> don't do the right UI things with dots other than U+002E, bad
>>> things will happen.  And, if we work the tradeoffs so that
>>> types of compatibility issues overwhelm the reasons why
>>> special dot mapping was a bad idea, then we are stuck with
>>> the special dots forever.  
> 
> 
> 
> 
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
> 



More information about the Idna-update mailing list