NSM flaw?

Thu Sep 17 13:20:44 CEST 2009

Abdulrahman I. ALGhadir wrote:
> Thank you for replay,
> But as what I see in the protocol now that it did fix some problems which they have a Contextual-form rather than considering them as plain Unicode (ex. Allowing ZWJ/ZWNJ, disallowing starting of numbers in U-labels, not mixing scripts,... etc) all of these issues are contextual, and based on what you said they should be treated on the browser-level(or any level) and not in the protocol itself.
> Well I see the protocol at current stage is mutual allowing to fix some problems and rejecting some, I know it is hard to govern all the languages in this world and fixing all contextual problems which may lead for spoofing attempts, but the protocol should follow a clear path either to support them (by fixing them all, that is) Or to consider these labels as plain sequence of Unicode and leave other levels to handle the fixing of these kind of problems.
>
> I know I am a bit late to arise things like this, but for the importance of the problem I had to do it, Sorry.
>   
In the case of NSM, I believe that some scripts (Vietnamese?) use 
multiple NSMs to indicate that multiple accents should be placed on a 
character. So we can't make a general rule saying that sequences of NSMs 
are forbidden.

I checked the Unicode book, but I can't find either a statement that two 
occurences of the same combining mark are forbidden or that they are 
explicitly permitted (and expected to have some reasonable effect). So 
Unicode doesn't give us guidance (or I missed it - Unicoders?)

The rules in the current set of drafts balance two concerns:

- What is needed for using "words in a language" as labels should be allowed
- What presents clear and present danger should be disallowed
- What presents clear and present danger, but is still necessary (not 
just "nice to have") in some cases, should have its usage circumscribed

I think the last discussion we had on that basis was for TATWEEL.

Multiple occurences of identical NSMs may be dangerous enough, and not 
blocking anything else in our design criteria, that we could do 
something about them - but I'm hesitant to accept this without careful 
study of the use of those signs across *all* scripts. And that is 
something that I am not happy with taking the time for this long after 
Last Call.

Given that we're working in "tradeoff space", not in "black and white 
space" here, I'm sure we will find issues the day after the RFCs are 
published, too. At some point we have to move on.

                     Harald