Combining mark vs combining character?

Simon Josefsson simon at josefsson.org
Wed Jan 5 15:20:27 CET 2011


Thank you for clear answer!

In a revision of the documents, it would help to say this explicitly, so
there is a normative description.  Right now there is an informative
reference to a section in Unicode that doesn't give enough detail.

/Simon

Vint Cerf <vint at google.com> writes:

> yes, having general category M seems to encompass both "mark" and
> "character" - at least for IDNA2008 purposes.
>
> v
>
>
> On Wed, Jan 5, 2011 at 9:07 AM, Simon Josefsson <simon at josefsson.org> wrote:
>
>> Vint Cerf <vint at google.com> writes:
>>
>> > Simon,
>> >
>> > I am pretty sure that the terms "combining mark" and "combining
>> character"
>> > as used in IDNA2008 mean the same thing.
>> >
>> > neither are permitted as the initial character of a Unicode domain label
>>
>> Thanks.  And the practical definition of what a combining mark&character
>> is that it has a General Category of M as explained in section 3.6 of
>> Unicode 5.0 quoted below?
>>
>> Note that this is different than having a non-0 Combining Class value.
>>
>> /Simon
>>
>> > vint
>> >
>> >
>> >
>> >
>> > On Wed, Jan 5, 2011 at 5:06 AM, Simon Josefsson <simon at josefsson.org>
>> wrote:
>> >
>> >> Hi,
>> >>
>> >> I need a clarification regarding this paragraph in section 4.2.3.2 of
>> >> RFC 5891:
>> >>
>> >>   The Unicode string MUST NOT begin with a combining mark or combining
>> >>   character (see The Unicode Standard, Section 2.11 [Unicode] for an
>> >>   exact definition).
>> >>
>> >> And this in section 5.4:
>> >>
>> >>   Putative U-labels with any of the following characteristics MUST be
>> >>   rejected prior to DNS lookup:
>> >> ...
>> >>   o  Labels whose first character is a combining mark (see The Unicode
>> >>      Standard, Section 2.11 [Unicode]).
>> >>
>> >> The reference to [Unicode] is not normative, which would be a problem
>> >> for any implementer.
>> >>
>> >> Reading section 2.11 of Unicode 5.0 discuss "combining character" but
>> >> not "combining mark".
>> >>
>> >> There is a section 7.9 in Unicode 5.0 called "Combining Marks".
>> >>
>> >> A section that discuss both Combining Marks and Combining Characters in
>> >> the same section is section 3.11 on "Canonical Ordering Behaviour".
>> >>
>> >> There is one section 3.6 on "Combination" that gives the precice
>> >> definition of a "Combining character":
>> >>
>> >>   Combining character: A character with the General Category of
>> >>   Combining Mark (M).
>> >>
>> >> Is this the intended definition of Combining character by RFC 5891?
>> >>
>> >> Questions:
>> >>
>> >> 1) Does RFC 5891 refer to "combining mark" and "combining character" as
>> >> the same thing?
>> >>
>> >> 2) Is there a significant difference between the requirement in 4.2.3.2
>> >> and 5.4?  The latter section only mentions "combining mark" and not
>> >> "combining character".
>> >>
>> >> 3) What is the precice definition of a "combining mark"?
>> >>
>> >> /Simon
>> >> _______________________________________________
>> >> Idna-update mailing list
>> >> Idna-update at alvestrand.no
>> >> http://www.alvestrand.no/mailman/listinfo/idna-update
>> >>
>> > _______________________________________________
>> > Idna-update mailing list
>> > Idna-update at alvestrand.no
>> > http://www.alvestrand.no/mailman/listinfo/idna-update
>>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update


More information about the Idna-update mailing list