New version of strawman for IDNAv2

Alireza Saleh saleh at nic.ir
Fri Feb 27 15:55:20 CET 2009


Patrik Fältström wrote:
> On 27 feb 2009, at 11.31, Alireza Saleh wrote:
>
>> John C Klensin wrote:
>>> --On Thursday, February 26, 2009 19:46 -0800 Paul Hoffman
>>> <phoffman at imc.org> wrote:
>>>
>>>
>>>> This tees into John's recent thread on parsing the issues and
>>>> finding a middle ground. I have included many of the
>>>> suggestions from the mailing list and off-line responses. Most
>>>> significantly, I have changed ZWNJ and ZWJ from "mapped to
>>>> nothing" to being allowed so that Arabic labels will be more
>>>> realistic.
>>>>
>>>
>>> Paul,
>>>
>>> With the understanding that I still don't believe this is the
>>> right way to go, one technical correction and one issue:
>>>
>>> (1) ZWJ and ZWNJ are not needed for Arabic language orthography.
>>> ZWNJ is needed for Persian languages and what are sometimes
>>> called Indo-Arabic ones (e.g., Urdu, but there are _many_
>>> others).  Both ZWJ and ZWNJ are needed for several of the Indic
>>> scripts and associated languages (although slightly fewer with
>>> Unicode 5.1 than with Unicode 3.2).
>>>
>> Having the ZWNJ,ZWJ is mandatory, but allowing them without any
>> condition will cause creation of  many confusing names, however, i think
>> it should be allowed without any condition within the protocol and each
>> registry which likes to support Persian or Indo-Arabic languages should
>> take care of handling the confusions.
>
> Can you please look at (2) below, and say which one of the 
> alternatives (i) to (iv) you prefer?

I prefer (i) . Please notice that, there are also other factors cause 
visual confusions despite the rules are supposed to prevent them (e.g, 
inappropriate font ). For none Arabic-script user,  I can't see much 
difference if for example .com registry permits 1<ZWNJ>ong.com or 
1ong.com as equivalent for long.com.

Best
Alireza


>
>>> (2) When one considers the number of registries/zones on the
>>> Internet or even those that exist only at the second level
>>> (i.e., maintaining registrations for third-level names), it is
>>> certain that some of them will be operated by people with bad
>>> intentions.  Given that, are you confident that ZWJ/ZWNJ can
>>> simply be treated as ordinary characters, relying on the
>>> registries to prevent those characters where they would be fully
>>> invisible?
>>>
>>> When faced with that question very early in the IDNA2008 design
>>> process, we concluded that there were four possible answers:
>>>
>>>     (i) Yes, we trust the registries and are willing to live
>>>     with labels like "ábc" failing to compare equal to
>>>     "áb<ZWJ>c" despite looking exactly the same when
>>>     displayed by normal rendering software.
>>>     
>>>     (ii) We don't quite trust the registries but are
>>>     confident that all rendering software, on all operating
>>>     systems, that encounter strings like "áb<ZWJ>c" will
>>>     get upset in sufficient vivid ways to warn the user off.
>>>     We didn't think rendering ZWJ as a little box or
>>>     question mark would be adequate for that case because it
>>>     might be a legitimate character for which no font was
>>>     available even though it would at least not be confused
>>>     with "ábc".
>>>     
>>>     (iii) We either leave things as they are in IDNA2003
>>>     (map to nothing) or simply ban the character.  Either
>>>     one puts the scripts that need one or both of these
>>>     characters at an intolerable disadvantage.
>>>     
>>>     (iv) We adopt some sort of "contextual rule" model,
>>>     despite the complexity it adds.
>>>
>>> Obviously, we chose the fourth.  We did so because we didn't
>>> believe the assumptions that (i) or (ii) implied and did not
>>> consider (iii) to be acceptable given the number of people who
>>> use the relevant scripts.   As I read your document, you are
>>> proposing (i).   Is that correct and, if so, could you explain a
>>> bit better how you see the tradeoffs?
>>>
>>> Please also note that, if you permit ZWJ and/or ZWNJ as
>>> characters, we end up in exactly the same situation that you and
>>> others have objected to with Eszett and Final Sigma, i.e., an
>>> input string that converts to a different A-label in IDNA2003
>>> and IDNA2008.  I'm prepared to live with that but, to the degree
>>> to which you consider it a problem so serious as to require
>>> rechartering and a completely different document strategy, I'd
>>> like to better understand the exception and its implications.
>>> In particular, I don't see the section of your outline document
>>> that discussed the transition strategy that many people (I think
>>> including you, but could be wrong about that) have argued is
>>> absolutely essential if there are going to be any
>>> incompatibilities of that sort.
>
> Patrik
>
>>>
>>>
>>> best,
>>>   john
>>>
>>> _______________________________________________
>>> Idna-update mailing list
>>> Idna-update at alvestrand.no
>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>>
>>
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>



More information about the Idna-update mailing list