Roozbeh's concerns on IAB statement (was Re: Fwd: IAB Statement on Identifiers and Unicode 7.0.0)

Wed Jan 28 21:31:23 CET 2015

Sorry, email was "sent" while still drafting. Will send complete email in a
few minutes.

On Wed, Jan 28, 2015 at 12:30 PM, Roozbeh Pournader <roozbeh at google.com>
wrote:

> On Wed, Jan 28, 2015 at 6:52 AM, Andrew Sullivan <ajs at anvilwalrusden.com>
> wrote:
>>
>> On Tue, Jan 27, 2015 at 09:51:34AM -0800, Roozbeh Pournader wrote:
>> > It's very unfortunate, and I'd say ill-informed. The conclusions are not
>> > supported by the claims
>>
>> It would be very helpful to me if you could point out which statements
>> are ill-informed or which conclusions are not supported by the
>> premises in that statement.
>>
>
> Sure. I thought I and other have done that. But let me try:
>
> There are three major things wrong wrong with the final recommendations.
> As you were involved in the process, you can try to trace which is failed
> by which phase:
>
>    - Arabic letters which have absolutely no problem with normalization
>    in Unicode, such as U+0624 ARABIC LETTER ALEF WITH HAMZA ABOVE are
>    discouraged, with absolutely no difference in how Unicode treats them and
>    how it treats U+00E4 LATIN SMALL LETTER A WITH DIAERESIS. Both characters
>    are canonically decomposed to two proper pieces. The reasoning for
>    discouraging U+0624 appears to be that it has hamza in it.
>    - Other Arabic letters that have the exact confusability issues that
>    U+08A1 ARABIC LETTER BEH WITH HAMZA ABOVE
>
>
> Note also that the IAB can produce advice to others, but it doesn't
>> have a police force.  I would not be the least surprised if, on
>> reflection, someone creating an identifier using Arabic script decided
>> to ignore the general advice in one particular case.  Your network,
>> your rules, after all.
>>
>
> Yes, but IAB is supposed to have done its homework. Others with not enough
> knowledge in the matter may defer to IAB's conclusions, as the document
> "appears" to be well-researched. So in practice, people may just block Yeh
> Hamza because they don't know better, and the poor Arabic users spread
> around the world would have a hard time finding.
>
>
>>
>> > Then, other characters in the Arabic script that have identical
>> > confusability issues (I'll leave finding them as an exercise to the
>> reader,
>> > to minimize the damage to the script) are not listed.
>>
>> I think this is extremely unfortunate, and I urge you to reconsider.
>> The issue for the protocols is not going to go away on its own.  What
>> you appear to be saying is that you know a whole bunch more
>> Arabic-character cases like this, but you're not going to tell us what
>> they are.  That sounds like a reason to avoid Arabic-script
>> identifiers at all until a fuller evaluation is done, and I doubt very
>> much that either of us wants that sort of suggestion floating around.
>> And given that some people will be creating identifiers no matter
>> what, isn't it better that they be doing so with a full appreciation
>> of what risks might be involved?
>>
>> > On top of that, only Arabic script is excluded while such characters
>> exist
>> > in several other scripts, specially Latin.
>>
>> Late in the process of drafting that statement, the
>> Internationalization program became aware of some cases in Latin that,
>> we _think_, may expose the same issues.  We have not done complete
>> analysis, and in the interests of providing timely advice we decided
>> to start with the cases we know.  But the statement does note that
>> there are probably other cases.  If the various internationalization
>> communities working in the IETF (on this mailing list, and in the
>> nearly-complete PRECIS WG) turn to this issue urgently, then I imagine
>> the IAB will think the IETF has it under control and will remain
>> silent.  On the other hand, if the IAB (and the Internationalization
>> program) completely grasps this issue for other characters and thinks
>> there is significant risk that the issue can't or won't be addressed
>> by the IETF community, then I'd hope the IAB would issue another
>> statement listing those other characters also.
>>
>> I should note that when we first started exploring the issue, the
>> informal response we got was, "_Hamza_ is special, read the Unicode
>> Standard more closely."  And indeed, the Standard does seem initially
>> to suggest that _Hamza_ is quite unlike other combining characters.
>> That's the reason for the focus on _Hamza_.  It's quite an accident
>> that it's only Arabic script; the issue is actually that Arabic is
>> where _Hamza_ is, and nothing to do with Arabic as such.
>>
>> > I'll refrain from commenting further on the threads:
>>
>> I urge you to reconsider.  We need greater participation and
>> understanding in this area, not less.  It is precisely because of the
>> low participation rates we've had in these i18n issues that we keep
>> discovering problems late.
>>
>> Best regards,
>>
>> A
>>
>> --
>> Andrew Sullivan
>> ajs at anvilwalrusden.com
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20150128/6f63eab7/attachment-0001.html>