Roozbeh's concerns on IAB statement (was Re: Fwd: IAB Statement on Identifiers and Unicode 7.0.0)

Wed Jan 28 21:30:33 CET 2015

On Wed, Jan 28, 2015 at 6:52 AM, Andrew Sullivan <ajs at anvilwalrusden.com>
wrote:
>
> On Tue, Jan 27, 2015 at 09:51:34AM -0800, Roozbeh Pournader wrote:
> > It's very unfortunate, and I'd say ill-informed. The conclusions are not
> > supported by the claims
>
> It would be very helpful to me if you could point out which statements
> are ill-informed or which conclusions are not supported by the
> premises in that statement.
>

Sure. I thought I and other have done that. But let me try:

There are three major things wrong wrong with the final recommendations. As
you were involved in the process, you can try to trace which is failed by
which phase:

   - Arabic letters which have absolutely no problem with normalization in
   Unicode, such as U+0624 ARABIC LETTER ALEF WITH HAMZA ABOVE are
   discouraged, with absolutely no difference in how Unicode treats them and
   how it treats U+00E4 LATIN SMALL LETTER A WITH DIAERESIS. Both characters
   are canonically decomposed to two proper pieces. The reasoning for
   discouraging U+0624 appears to be that it has hamza in it.
   - Other Arabic letters that have the exact confusability issues that
   U+08A1 ARABIC LETTER BEH WITH HAMZA ABOVE

Note also that the IAB can produce advice to others, but it doesn't
> have a police force.  I would not be the least surprised if, on
> reflection, someone creating an identifier using Arabic script decided
> to ignore the general advice in one particular case.  Your network,
> your rules, after all.
>

Yes, but IAB is supposed to have done its homework. Others with not enough
knowledge in the matter may defer to IAB's conclusions, as the document
"appears" to be well-researched. So in practice, people may just block Yeh
Hamza because they don't know better, and the poor Arabic users spread
around the world would have a hard time finding.

>
> > Then, other characters in the Arabic script that have identical
> > confusability issues (I'll leave finding them as an exercise to the
> reader,
> > to minimize the damage to the script) are not listed.
>
> I think this is extremely unfortunate, and I urge you to reconsider.
> The issue for the protocols is not going to go away on its own.  What
> you appear to be saying is that you know a whole bunch more
> Arabic-character cases like this, but you're not going to tell us what
> they are.  That sounds like a reason to avoid Arabic-script
> identifiers at all until a fuller evaluation is done, and I doubt very
> much that either of us wants that sort of suggestion floating around.
> And given that some people will be creating identifiers no matter
> what, isn't it better that they be doing so with a full appreciation
> of what risks might be involved?
>
> > On top of that, only Arabic script is excluded while such characters
> exist
> > in several other scripts, specially Latin.
>
> Late in the process of drafting that statement, the
> Internationalization program became aware of some cases in Latin that,
> we _think_, may expose the same issues.  We have not done complete
> analysis, and in the interests of providing timely advice we decided
> to start with the cases we know.  But the statement does note that
> there are probably other cases.  If the various internationalization
> communities working in the IETF (on this mailing list, and in the
> nearly-complete PRECIS WG) turn to this issue urgently, then I imagine
> the IAB will think the IETF has it under control and will remain
> silent.  On the other hand, if the IAB (and the Internationalization
> program) completely grasps this issue for other characters and thinks
> there is significant risk that the issue can't or won't be addressed
> by the IETF community, then I'd hope the IAB would issue another
> statement listing those other characters also.
>
> I should note that when we first started exploring the issue, the
> informal response we got was, "_Hamza_ is special, read the Unicode
> Standard more closely."  And indeed, the Standard does seem initially
> to suggest that _Hamza_ is quite unlike other combining characters.
> That's the reason for the focus on _Hamza_.  It's quite an accident
> that it's only Arabic script; the issue is actually that Arabic is
> where _Hamza_ is, and nothing to do with Arabic as such.
>
> > I'll refrain from commenting further on the threads:
>
> I urge you to reconsider.  We need greater participation and
> understanding in this area, not less.  It is precisely because of the
> low participation rates we've had in these i18n issues that we keep
> discovering problems late.
>
> Best regards,
>
> A
>
> --
> Andrew Sullivan
> ajs at anvilwalrusden.com
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20150128/c0206520/attachment.html>