FW: Your statement on Identifiers and Unicode 7.0.0

Thu Feb 5 07:48:38 CET 2015

Dear John,

Sorry! ... I can see that you might be irritated by the concept of excluding the "non-combining characters" when you linked it to Latin script. BTW, the context of our request was for the Arabic Script. We tend not to generalize our finding to other scripts.

Again, our view is that, the current IAB statement recommended to exclude many code points from the Arabic script; at least three of those code points are essential code points for a number of widely used languages in the Arabic script ( Arabic, Farsi, Urdu, Jawi, Pashto ..etc). Almost all Arabic script IDN-TLDs are using them in domain name registrations! So, IAB's recommendations affects so many users and domains without any logic; since some of the excluded code points already have suitable normalization rules in place! 

Personally, I am against the concept of excluding any code points without a full study for the root cause of the problem and after consulting experts from that Scripts. However,  if the IAB intention was just to raise a warring to the community, then I believe that they should exclude the "non-combining characters" in the Arabic script (BTW, no Arabic IDN TLD registry use non-combining characters till now) rather than excluding some essential letters (that does not have any problems and is used by many Arabic IDN TLD registries).

With best regards,

Raed I. Al-Fayez
------------------------------------------
Saudi Network Information Center (SaudiNIC) 
Communication and Information Technology Commission (CITC)
Tel: + 966-11-4618216   - Fax: + 966-11-2639393
http://www.nic.net.sa

-----Original Message-----
From: Idna-update [mailto:idna-update-bounces at alvestrand.no] On Behalf Of John C Klensin
Sent: Wednesday, February 04, 2015 11:33 PM
To: Jefsey; Abdulrahman I. ALGhadir; IDNA update work
Subject: Re: FW: Your statement on Identifiers and Unicode 7.0.0

--On Wednesday, February 04, 2015 17:44 +0100 Jefsey <jefsey at jefsey.com> wrote:

> At 07:31 03/02/2015, Abdulrahman I. ALGhadir wrote:
>> "A general rule may be extracted that combining marks should not be  
>> allowed for TLDs."
> 
> We are in agreement. This is the problem of having chosen Unicode 
> instead of having deployed a non-confusagle Unigraph compatible table.
> The "consensus" known better. A lot of wasted time and money, and of 
> unncessary irritation.

I suggest that both of you read the subthread that contains three very long notes between Asmus and myself.  Among the things that will learn there is that a "no combining mark"
system will not work for many uses with Latin script in Unicode or would require a _huge_ code set for many other scripts.
Similarly, while "no combining characters" will work well for writing the Arabic language in Arabic script, it will work much less well for several other languages written in that script
unless a lot of other precomposed characters are added.   If one
considers what Unicode and IDNA call "joiners" to be combining characters -- they certainly are in the sense that they modify the effects and sometimes the shape of the characters associated with the code points that precede or follow them-- then even a wider selection of precomposed characters is insufficient.

_Please_ do not assume that you can generalized from the characters,  languages, and scripts with which you are most familiar to everything else and to extremely broad rules.  It just doesn't work unless you are willing to give you on very different writing systems.

As to "a non-confusagle Unigraph compatible table", I look forward to seeing a serious and detailed proposal.  Many of us believe the notion is impossible for reasons that have at least as much to do with human perception as with writing systems.

    john

_______________________________________________
Idna-update mailing list
Idna-update at alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update