bidi spec

Harald Alvestrand harald at alvestrand.no
Sat Feb 2 22:49:48 CET 2008



--On Friday, February 01, 2008 14:59:18 -0800 Erik van der Poel 
<erikv at google.com> wrote:

> Harald, Cary, Ken and John,
>
> I have some questions and comments on the bidi spec.
>
> In section 3, what is the intention of the "delimiterchars"? The
> bullet on ET seems to indicate the intention, but it would be nice to
> see the intention earlier. Currently, it says 'Let "Delimiterchars" be
> a set of characters with the Unicode BIDI properties CS, WS, ON'. How
> about changing that to 'Let "Delimiterchars" be a set of characters
> commonly used to delimit domain names, with the Unicode BIDI
> properties...'

Good point. The reason for the convoluted way of stating things is that the 
delimiters that MUST work is dot (.), while many other characters are used, 
some of which don't work. It's a back-pointer to the "characters delimiting 
the label" statement in the informal form of the criterion.

>
> Then the ET bullet might say "ET, though it commonly occurs..."
> instead of using "which"
>
> I found this wording confusing: "In the paragraph containing a string
> formed from the substrings A B L C D", especially "the paragraph". How
> about "In a paragraph with an embedded string formed from the
> substrings..."

That's better - thanks!
>
> Or did you mean "If the paragraph is a string formed from the
> substrings..."?

No - it's important for the exposition to emphasize that the domain name is 
not the only thing in the paragraph.

> It's especially confusing because of the "where A and D are (possibly
> zero-length) legal labels", when other parts of the document talk
> about embedding strings inside paragraphs. Note also that if A is
> zero-length and B is a dot, it is illegal (in DNS), and the last label
> of an FQDN is always the zero-length root label, even if most people
> write domain names without a dot at the end.

It's not a legal DNS name where hostnames are used, but not uncommon 
elsewhere. "The domain name fragment .foobar. has dots around it" is a 
typical example.

> As far as I can tell, the spec does not explicitly say that uppercase
> is used to denote RTL characters, as is often done in other bidi
> specs.

Good point. I'll insert that. Most of the time, I use the bidi classes 
explicitly (deliberate choice, since I believe the focus on just the R and 
L classes is a large part of the reason why we broke this so badly the 
first time).

> Section 2.1: "a conformant implementation of the IDNA algorithm will say"
>
> Please change "IDNA" to "IDNA2003".
>
> Section 2.3: "Considering the strings ALEF 5 (HEBREW LETTER ALEF +
> DIGIT FIVE and 5 ALEF."
>
> The ')' is missing. Should be after the "FIVE". Also please change
> "Considering" to "Consider".
>
> Section 3: "In a display of a string of labels, the characters of each
> label should remain grouped between the characters delimiting the
> label components."
>
> I suggest changing "delimiting the label components" to "delimiting
> the labels", since "label components" might be misinterpreted as
> "components of labels".
>
> Also in section 3: "((EXAMPLE NEEDED HERE)"
>
> Section 3: "in a RTL context", please change "a" to "an".
>
> "NSM - Nonspacking Mark", much as I like the sound of that, you
> probably don't want to leave it that way. :-)
>
> Section 8: "it is possible that the possible problems noted under",
> perhaps remove the 2nd "possible"?


These seem editorial, and I'll fix them.
Thanks for the careful read!

                Harald


More information about the Idna-update mailing list