Roozbeh's concerns on IAB statement (was Re: Fwd: IAB Statement on Identifiers and Unicode 7.0.0)
roozbeh at google.com
Wed Jan 28 21:30:33 CET 2015
On Wed, Jan 28, 2015 at 6:52 AM, Andrew Sullivan <ajs at anvilwalrusden.com>
> On Tue, Jan 27, 2015 at 09:51:34AM -0800, Roozbeh Pournader wrote:
> > It's very unfortunate, and I'd say ill-informed. The conclusions are not
> > supported by the claims
> It would be very helpful to me if you could point out which statements
> are ill-informed or which conclusions are not supported by the
> premises in that statement.
Sure. I thought I and other have done that. But let me try:
There are three major things wrong wrong with the final recommendations. As
you were involved in the process, you can try to trace which is failed by
- Arabic letters which have absolutely no problem with normalization in
Unicode, such as U+0624 ARABIC LETTER ALEF WITH HAMZA ABOVE are
discouraged, with absolutely no difference in how Unicode treats them and
how it treats U+00E4 LATIN SMALL LETTER A WITH DIAERESIS. Both characters
are canonically decomposed to two proper pieces. The reasoning for
discouraging U+0624 appears to be that it has hamza in it.
- Other Arabic letters that have the exact confusability issues that
U+08A1 ARABIC LETTER BEH WITH HAMZA ABOVE
Note also that the IAB can produce advice to others, but it doesn't
> have a police force. I would not be the least surprised if, on
> reflection, someone creating an identifier using Arabic script decided
> to ignore the general advice in one particular case. Your network,
> your rules, after all.
Yes, but IAB is supposed to have done its homework. Others with not enough
knowledge in the matter may defer to IAB's conclusions, as the document
"appears" to be well-researched. So in practice, people may just block Yeh
Hamza because they don't know better, and the poor Arabic users spread
around the world would have a hard time finding.
> > Then, other characters in the Arabic script that have identical
> > confusability issues (I'll leave finding them as an exercise to the
> > to minimize the damage to the script) are not listed.
> I think this is extremely unfortunate, and I urge you to reconsider.
> The issue for the protocols is not going to go away on its own. What
> you appear to be saying is that you know a whole bunch more
> Arabic-character cases like this, but you're not going to tell us what
> they are. That sounds like a reason to avoid Arabic-script
> identifiers at all until a fuller evaluation is done, and I doubt very
> much that either of us wants that sort of suggestion floating around.
> And given that some people will be creating identifiers no matter
> what, isn't it better that they be doing so with a full appreciation
> of what risks might be involved?
> > On top of that, only Arabic script is excluded while such characters
> > in several other scripts, specially Latin.
> Late in the process of drafting that statement, the
> Internationalization program became aware of some cases in Latin that,
> we _think_, may expose the same issues. We have not done complete
> analysis, and in the interests of providing timely advice we decided
> to start with the cases we know. But the statement does note that
> there are probably other cases. If the various internationalization
> communities working in the IETF (on this mailing list, and in the
> nearly-complete PRECIS WG) turn to this issue urgently, then I imagine
> the IAB will think the IETF has it under control and will remain
> silent. On the other hand, if the IAB (and the Internationalization
> program) completely grasps this issue for other characters and thinks
> there is significant risk that the issue can't or won't be addressed
> by the IETF community, then I'd hope the IAB would issue another
> statement listing those other characters also.
> I should note that when we first started exploring the issue, the
> informal response we got was, "_Hamza_ is special, read the Unicode
> Standard more closely." And indeed, the Standard does seem initially
> to suggest that _Hamza_ is quite unlike other combining characters.
> That's the reason for the focus on _Hamza_. It's quite an accident
> that it's only Arabic script; the issue is actually that Arabic is
> where _Hamza_ is, and nothing to do with Arabic as such.
> > I'll refrain from commenting further on the threads:
> I urge you to reconsider. We need greater participation and
> understanding in this area, not less. It is precisely because of the
> low participation rates we've had in these i18n issues that we keep
> discovering problems late.
> Best regards,
> Andrew Sullivan
> ajs at anvilwalrusden.com
> Idna-update mailing list
> Idna-update at alvestrand.no
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Idna-update