Combining mark vs combining character?

Simon Josefsson simon at josefsson.org
Wed Jan 5 11:06:40 CET 2011


Hi,

I need a clarification regarding this paragraph in section 4.2.3.2 of
RFC 5891:

   The Unicode string MUST NOT begin with a combining mark or combining
   character (see The Unicode Standard, Section 2.11 [Unicode] for an
   exact definition).

And this in section 5.4:

   Putative U-labels with any of the following characteristics MUST be
   rejected prior to DNS lookup:
...
   o  Labels whose first character is a combining mark (see The Unicode
      Standard, Section 2.11 [Unicode]).

The reference to [Unicode] is not normative, which would be a problem
for any implementer.

Reading section 2.11 of Unicode 5.0 discuss "combining character" but
not "combining mark".

There is a section 7.9 in Unicode 5.0 called "Combining Marks".

A section that discuss both Combining Marks and Combining Characters in
the same section is section 3.11 on "Canonical Ordering Behaviour".

There is one section 3.6 on "Combination" that gives the precice
definition of a "Combining character":

   Combining character: A character with the General Category of
   Combining Mark (M).

Is this the intended definition of Combining character by RFC 5891?

Questions:

1) Does RFC 5891 refer to "combining mark" and "combining character" as
the same thing?

2) Is there a significant difference between the requirement in 4.2.3.2
and 5.4?  The latter section only mentions "combining mark" and not
"combining character".

3) What is the precice definition of a "combining mark"?

/Simon


More information about the Idna-update mailing list