Mark Davis ☕
mark at macchiato.com
Thu Sep 17 16:57:37 CEST 2009
As you say, multiple occurrences of NSMs are permitted and necessary. They
are described on p51, Section 2.11 "Combining Characters", subsection
"Multiple Combining Characters" (although people want to read starting at
p48). Briefly, the NSMs should be (in the absence of
language/script-specific layout) be stacked outwards from the base.
I don't know of a language that needs two of the same NSM in a row; but that
doesn't mean that one doesn't exist. And it is an issue if the rendering
engine incorrectly places the NSMs so as to coincide. But forbidding two of
the same in a row doesn't solve the problem: if a rendering engine overlaps
NSMs,* then it is easy to "hide" an NSM by just using it with different NSMs
*or* even just base characters that turn on the same pixels*. Should one
disallow Arabic FATHA because it can be hidden if placed on an A-acute in a
particular font on a particular system? Probably not.
These are issues that client software on a particular system can warn about,
but IDNA cannot reliably detect or address in the protocol.
On Thu, Sep 17, 2009 at 04:20, Harald Alvestrand <harald at alvestrand.no>wrote:
> Abdulrahman I. ALGhadir wrote:
> > Thank you for replay,
> > But as what I see in the protocol now that it did fix some problems which
> they have a Contextual-form rather than considering them as plain Unicode
> (ex. Allowing ZWJ/ZWNJ, disallowing starting of numbers in U-labels, not
> mixing scripts,... etc) all of these issues are contextual, and based on
> what you said they should be treated on the browser-level(or any level) and
> not in the protocol itself.
> > Well I see the protocol at current stage is mutual allowing to fix some
> problems and rejecting some, I know it is hard to govern all the languages
> in this world and fixing all contextual problems which may lead for spoofing
> attempts, but the protocol should follow a clear path either to support them
> (by fixing them all, that is) Or to consider these labels as plain sequence
> of Unicode and leave other levels to handle the fixing of these kind of
> > I know I am a bit late to arise things like this, but for the importance
> of the problem I had to do it, Sorry.
> In the case of NSM, I believe that some scripts (Vietnamese?) use
> multiple NSMs to indicate that multiple accents should be placed on a
> character. So we can't make a general rule saying that sequences of NSMs
> are forbidden.
> I checked the Unicode book, but I can't find either a statement that two
> occurences of the same combining mark are forbidden or that they are
> explicitly permitted (and expected to have some reasonable effect). So
> Unicode doesn't give us guidance (or I missed it - Unicoders?)
> The rules in the current set of drafts balance two concerns:
> - What is needed for using "words in a language" as labels should be
> - What presents clear and present danger should be disallowed
> - What presents clear and present danger, but is still necessary (not
> just "nice to have") in some cases, should have its usage circumscribed
> I think the last discussion we had on that basis was for TATWEEL.
> Multiple occurences of identical NSMs may be dangerous enough, and not
> blocking anything else in our design criteria, that we could do
> something about them - but I'm hesitant to accept this without careful
> study of the use of those signs across *all* scripts. And that is
> something that I am not happy with taking the time for this long after
> Last Call.
> Given that we're working in "tradeoff space", not in "black and white
> space" here, I'm sure we will find issues the day after the RFCs are
> published, too. At some point we have to move on.
> Idna-update mailing list
> Idna-update at alvestrand.no
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Idna-update