bidi spec

Erik van der Poel erikv at google.com
Fri Feb 1 23:59:18 CET 2008


Harald, Cary, Ken and John,

I have some questions and comments on the bidi spec.

In section 3, what is the intention of the "delimiterchars"? The
bullet on ET seems to indicate the intention, but it would be nice to
see the intention earlier. Currently, it says 'Let "Delimiterchars" be
a set of characters with the Unicode BIDI properties CS, WS, ON'. How
about changing that to 'Let "Delimiterchars" be a set of characters
commonly used to delimit domain names, with the Unicode BIDI
properties...'

Then the ET bullet might say "ET, though it commonly occurs..."
instead of using "which"

I found this wording confusing: "In the paragraph containing a string
formed from the substrings A B L C D", especially "the paragraph". How
about "In a paragraph with an embedded string formed from the
substrings..."

Or did you mean "If the paragraph is a string formed from the substrings..."?

It's especially confusing because of the "where A and D are (possibly
zero-length) legal labels", when other parts of the document talk
about embedding strings inside paragraphs. Note also that if A is
zero-length and B is a dot, it is illegal (in DNS), and the last label
of an FQDN is always the zero-length root label, even if most people
write domain names without a dot at the end.

As far as I can tell, the spec does not explicitly say that uppercase
is used to denote RTL characters, as is often done in other bidi
specs.

Section 2.1: "a conformant implementation of the IDNA algorithm will say"

Please change "IDNA" to "IDNA2003".

Section 2.3: "Considering the strings ALEF 5 (HEBREW LETTER ALEF +
DIGIT FIVE and 5 ALEF."

The ')' is missing. Should be after the "FIVE". Also please change
"Considering" to "Consider".

Section 3: "In a display of a string of labels, the characters of each
label should remain grouped between the characters delimiting the
label components."

I suggest changing "delimiting the label components" to "delimiting
the labels", since "label components" might be misinterpreted as
"components of labels".

Also in section 3: "((EXAMPLE NEEDED HERE)"

Section 3: "in a RTL context", please change "a" to "an".

"NSM - Nonspacking Mark", much as I like the sound of that, you
probably don't want to leave it that way. :-)

Section 8: "it is possible that the possible problems noted under",
perhaps remove the 2nd "possible"?

Erik


More information about the Idna-update mailing list