bidi spec

Harald Alvestrand harald at alvestrand.no
Thu Feb 7 09:09:11 CET 2008


Mark Davis skrev:
> I don't think these are the minimal rules yet, since some *parts* of
> these rules can be removed. Here are the ones you say to keep:
>
> Proposed rules (numbering for reference)"
>
>    1. Only characters with the BIDI properties L, R, AL, EN, ES, BN,
>       ON and NSM are allowed.
>    2. ES and ON are not allowed in the first position
>    3. ES and ON, followed by zero or more NSM, is not allowed in the
>       last position
>    4. If an L is present, no R, AL or AN may be present.
>    5. If an EN is present, no AN may be present
>    6. The first character may not be an NSM
>    7. The first character may not be an EN (European Number) or an AN
>       (Arabic Number).
>
> Comments
>
>    1. All of this applies only to Bidi labels: that is, those with
>       BIDI properties R, AL, or AN. Because of that, #4 can be changed
>       to be simply: No L.
>
Sorry, you're wrong. It applies to all labels that are used in a BIDI
context.
In particular, rule 7 has to apply to ASCII-only labels.
>
>   1.
>
>
>    2. According to #1, AN is not allowed at all. That would remove #5,
>       and remove AN from #4 and #7. However, I think that's a mistake
>       -- as discussed before--  that we need to include AN in #1.
>
It's a mistake, and I've said so before. #5 and #7 would be meaningless
if it wasn't there.
>
>    1. Because the protocol limits to only [[:L:][:Mn:][:Mc:][:Nd:]]
>       plus a handful of exceptions, #1 is redundant because of the
>       other restrictions in the protocol document. If it is retained,
>       we should at least have a note about that.
>
Because I disagree with your first point, I disagree with this one too.
>
>   1.
>
>
>    2. The only characters allowed in NSM are [:Mn:], and [:Me:]. The
>       protocol forbits [:Me:] entirely, and forbids [:Mc:] in first
>       position. So #6 is redundant. If it is retained, we should at
>       least have a note about that.
>

I'm not aware of an Unicode stability guarantee that guarantees the
property stated above, and it is in fact false, you have 10 characters
in class "Me" that are NSM too. (These aren't allowed in labels either,
according to tables-03, btw - but I have no idea what an ARABIC START OF
RUB EL HIZB is, or whether someone will step up tomorrow and demand an
exceptioon for it.)

Since the rest of the document is stated only in terms of BIDI
properties, I'd like to keep it stated in terms of BIDI properties. The
current formulation has all requirements that derive from the BIDI
properties in this document, and makes no assumptions on what the other
documents say; that's a Good Thing in my opinion.

>    1. #7 can be combined with #2; all about the first character.
>
I prefer it the way it is (keeping the numbers stuff separate from the
non-numbers stuff), but that's a stylistic opinion; the two are equivalent.

                 Harald



More information about the Idna-update mailing list