consensus Call: TATWEEL
saleh at nic.ir
Thu Mar 26 11:47:19 CET 2009
thanks for the description, but I still don't understand why this
character has been coded as Arabic-Letter ? and as it is for now then it
should be in the protocol. If IDNA2008 wants to be independent from the
specific version of Unicode it shouldn't make decision on character by
character basis. Maybe, in Unicode 5.2 there will be a character that
have the same characteristics like Tatweel, then we should update the
protocol documents again.
Besides, I still don't hear any arguments other than visual confusion.
As you know, there are many of these visual confusions
that still remains within the Arabic script and are expected to be
handled at the registry. I'm not arguing for permitting Tatweel as such,
what I'm arguing is that the way of making decisions should be changed.
If the IDNAbis working group thinks that all these problems
should be handled at the protocol level, then please go ahead and
resolve them ALL; there will be a long list of confusions more dangerous
than having Tatweel can be sent to the group. if not, better leave them
all to be resolved at the registry. The registry rules may be only
effective to the labels that are registered within the registry, but
please note that the domain owner can create a confusion sub-label as
well as confusion URI, with which IDNAbis has nothing to do if it comes
Kenneth Whistler wrote:
> Alireza asked:
>> Why you think they are very unlike ? other than it has been using for
>> many years in DNS
>> Kenneth Whistler wrote:
>>>> Hyphen and tatweel are very unlike.
>>> I agree. Which is why hyphen (U+002D) does (and must
>>> continue to) occur in domain names, and why U+0640
>>> ARABIC TATWEEL shouldn't.
> Well, let me count the ways. ;-)
> U+002D HYPHEN-MINUS
> 1. Is in the ASCII subset, which has all kinds of implications
> for grandfathered usage in protocols, syntax, etc.
> 2. Is ambiguous between usage as a punctuation mark (hyphen)
> and a mathematical unary operator (minus sign).
> 3. May make content distinctions in some orthographies, both
> lexically and/or syntactically.
> 4. Has the Line_Break property lb=HY, with implications for
> hyphenation and line breaking behavior.
> 5. Has the Word_Break property wb=Other, so by default will
> mark a word break boundary.
> 6. Is Common script, used with many scripts besides Latin.
> 7. Is General_Category, gc=Pd, i.e. a punctuation dash.
> 8. Has the Bidi_Class, bc=ES, with implications for numeric layout.
> U+0640 ARABIC TATWEEL
> 1. Is not in the ASCII subset.
> 2. Is neither a punctuation mark nor a mathematical operator.
> 3. Makes no content distinctions in text, but is used only
> to justify text for display.
> 4. Has the Line_Break property lb=AL, i.e. is treated like
> any letter for the purposes of line breaking, and does
> not mark special opportunities for line breaking.
> 5. Has the Word_Break property wb=ALetter, so by default will
> never mark a word break boundary.
> 6. Is Arabic and Syriac script only, and requires specific font design
> to harmonize with an Arabic (or Syriac) font baseline.
> 7. Is General_Category, gc=Lm, i.e. a modifier letter.
> 8. Has the Bidi_Class, bc=AL, i.e. behaves for bidi like true
> Arabic letters.
> There are other distinctions, but I think continuing in this
> vein would be probably be more than is required.
> What do the two characters share?: vaguely similar appearances
> (in some fonts only, when the glyphs are viewed in isolation
> and not in context).
> Now what we might be missing is that some Arabic system users
> *may* have repurposed U+0640 (or more likely its analogue
> in 8-bit systems: Windows 1256 0xDC, and ISO 8859-6 0xE0) as
> another kind of dash character, using it as an Arabic equivalent
> of HYPHEN-MINUS, even though because of its semantics it
> wouldn't work well that way on either Windows or other
> Unicode-based systems now.
> Is that what you are talking about, Alireza?
More information about the Idna-update