Combining mark vs combining character?
Simon Josefsson
simon at josefsson.org
Wed Jan 5 11:06:40 CET 2011
Hi,
I need a clarification regarding this paragraph in section 4.2.3.2 of
RFC 5891:
The Unicode string MUST NOT begin with a combining mark or combining
character (see The Unicode Standard, Section 2.11 [Unicode] for an
exact definition).
And this in section 5.4:
Putative U-labels with any of the following characteristics MUST be
rejected prior to DNS lookup:
...
o Labels whose first character is a combining mark (see The Unicode
Standard, Section 2.11 [Unicode]).
The reference to [Unicode] is not normative, which would be a problem
for any implementer.
Reading section 2.11 of Unicode 5.0 discuss "combining character" but
not "combining mark".
There is a section 7.9 in Unicode 5.0 called "Combining Marks".
A section that discuss both Combining Marks and Combining Characters in
the same section is section 3.11 on "Canonical Ordering Behaviour".
There is one section 3.6 on "Combination" that gives the precice
definition of a "Combining character":
Combining character: A character with the General Category of
Combining Mark (M).
Is this the intended definition of Combining character by RFC 5891?
Questions:
1) Does RFC 5891 refer to "combining mark" and "combining character" as
the same thing?
2) Is there a significant difference between the requirement in 4.2.3.2
and 5.4? The latter section only mentions "combining mark" and not
"combining character".
3) What is the precice definition of a "combining mark"?
/Simon
More information about the Idna-update
mailing list