Objection to draft-klensin-idna-5892upd-unicode70

Mon Aug 11 21:18:21 CEST 2014

To help show the similarity of the cases, please see the note under the
U+0654 ARABIC HAMZA ABOVE character in Unicode 7.0 charts:

• restricted to hamza and ezafe semantics
• is not used as a diacritic to form new letters

This is similar to U+0615 ARABIC SMALL HIGH TAH:

• should not be confused with the small TAH
sign used as a diacritic for some letters such as
0679 ٹ

... and various other characters, such as U+065A and U+065B.

On Mon, Aug 11, 2014 at 12:09 PM, Roozbeh Pournader <roozbeh at google.com>
wrote:

> On Mon, Aug 11, 2014 at 11:11 AM, Paul Hoffman <paul.hoffman at vpnc.org>
> wrote:
>
>> Until two months ago, they couldn't use the character in question in any
>> written text.
>>
>
> That's not true.
>
> The character was approved at the UTC meeting of February 2011. The SIL
> representatives who needed the character urgently were promised at the
> meeting that Unicode would not change the codepoint, so they could start
> using the character immediately. So in reality, they had three years and a
> half.
>
> Also, the character has been a part of ISO/IEC 10646 since Amendment 1 to
> the 2012 edition which was published in April 2013. It was frozen in that
> standard much earlier before that too.
>
>  > Unicode is full of confusable characters and character sequences (with
>> no canonical or compatibility decomposition pointing to them). Using a
>> canonical or compatibility decomposition mechanism only for finding such
>> cases doesn't make sense, nor does singling out some more obvious cases of
>> such confusables.
>>
>> That seems like a non sequitur because the draft never talks about
>> "confusable". It talks about issues of composition. Given that difference,
>> do you still highly object to the draft based on its contents?
>>
>
> Definitely. There is no difference. The issue that the draft is calling
> "composition" existed with the Arabic characters (and characters in several
> other scripts) long before this character, and several other new Unicode
> 7.0 characters expose such a behavior.
>
> In short, U+08A1 is not exposing a composition issue, but a confusability
> issue. The draft does not reflect the problem with a good understanding of
> the Unicode model (for Arabic and other scripts). The problem with U+08A1
> is no different than, say, U+0692, used in Kurdish that could be perceived
> to be composed of U+0631 and U+065A.
>
> Please explain how U+08A1 is different from U+0692 with regard to
> "composition".
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20140811/c574ac21/attachment-0001.html>