Fwd: IAB Statement on Identifiers and Unicode 7.0.0
ajs at anvilwalrusden.com
Wed Jan 28 15:52:33 CET 2015
On Tue, Jan 27, 2015 at 09:51:34AM -0800, Roozbeh Pournader wrote:
> It's very unfortunate, and I'd say ill-informed. The conclusions are not
> supported by the claims
It would be very helpful to me if you could point out which statements
are ill-informed or which conclusions are not supported by the
premises in that statement.
> , and some of the characters recommended to be
> excluded (such as Yeh With Hamza above) are so common in every orthography
> in the Arabic script (Yeh Hamza is perhaps in the top ten in frequency of
> letters) that it's basically impossible to avoid using them as identifiers.
At least some of us who worked on that statement (and please note, I'm
speaking for myself in this message and _not_ others on the IAB or in
the program) knew very well that the suggestion we were making was
pretty devastating. I believe the statement acknowledges this
explicitly, and notes that it is making this recommendation only
because there is a great deal of concern for future compatibility. We
know that something must happen, but we know not what.
Note also that the IAB can produce advice to others, but it doesn't
have a police force. I would not be the least surprised if, on
reflection, someone creating an identifier using Arabic script decided
to ignore the general advice in one particular case. Your network,
your rules, after all.
> Then, other characters in the Arabic script that have identical
> confusability issues (I'll leave finding them as an exercise to the reader,
> to minimize the damage to the script) are not listed.
I think this is extremely unfortunate, and I urge you to reconsider.
The issue for the protocols is not going to go away on its own. What
you appear to be saying is that you know a whole bunch more
Arabic-character cases like this, but you're not going to tell us what
they are. That sounds like a reason to avoid Arabic-script
identifiers at all until a fuller evaluation is done, and I doubt very
much that either of us wants that sort of suggestion floating around.
And given that some people will be creating identifiers no matter
what, isn't it better that they be doing so with a full appreciation
of what risks might be involved?
> On top of that, only Arabic script is excluded while such characters exist
> in several other scripts, specially Latin.
Late in the process of drafting that statement, the
Internationalization program became aware of some cases in Latin that,
we _think_, may expose the same issues. We have not done complete
analysis, and in the interests of providing timely advice we decided
to start with the cases we know. But the statement does note that
there are probably other cases. If the various internationalization
communities working in the IETF (on this mailing list, and in the
nearly-complete PRECIS WG) turn to this issue urgently, then I imagine
the IAB will think the IETF has it under control and will remain
silent. On the other hand, if the IAB (and the Internationalization
program) completely grasps this issue for other characters and thinks
there is significant risk that the issue can't or won't be addressed
by the IETF community, then I'd hope the IAB would issue another
statement listing those other characters also.
I should note that when we first started exploring the issue, the
informal response we got was, "_Hamza_ is special, read the Unicode
Standard more closely." And indeed, the Standard does seem initially
to suggest that _Hamza_ is quite unlike other combining characters.
That's the reason for the focus on _Hamza_. It's quite an accident
that it's only Arabic script; the issue is actually that Arabic is
where _Hamza_ is, and nothing to do with Arabic as such.
> I'll refrain from commenting further on the threads:
I urge you to reconsider. We need greater participation and
understanding in this area, not less. It is precisely because of the
low participation rates we've had in these i18n issues that we keep
discovering problems late.
ajs at anvilwalrusden.com
More information about the Idna-update