To be published: draft-alvestrand-idna-bidi-00.txt

Paul Hoffman phoffman at imc.org
Mon Oct 16 00:34:49 CEST 2006


At 11:35 PM +0200 10/15/06, Harald Alvestrand wrote:
>3.  Modification to RFC 3454
>
>    If the following modification is made to RFC 3454, we believe that
>    the usefulness of the specification for languages written with right-
>    to-left scripts will be significantly improved:
>
>    Old text:
>
>       [Unicode3.2] defines several bidirectional categories; each
>       character has one bidirectional category assigned to it.  For the
>       purposes of the requirements below, an "RandALCat character" is a
>       character that has Unicode bidirectional categories "R" or "AL";
>       an "LCat character" is a character that has Unicode bidirectional
>       category "L".
>
>    New text:
>
>       [Unicode3.2] defines several bidirectional categories; each
>       character has one bidirectional category assigned to it.
>
>       For characters that have category "R", "AL" or "L", the category
>       is fixed (UAX#9 defines them as having "strong" category); for
>       characters in category EN, ES, ET, AN, CS, NSM, BN, B, S, WS and
>       ON, the category is determined by applying the algorithm described
>       in UAX#9 section 3.3 to the string.
>
>       For the purposes of the requirements below, an "RandALCat
>       character" is a character that, after this determination, has
>       Unicode bidirectional categories "R" or "AL"; an "LCat character"
>       is a character that has Unicode bidirectional category "L".
>

Making every receiver correctly add all of the steps of section 3.3 
of UAX 9 is onerous and error-prone. A much simpler change would be 
to simply say that a character of type NSM is considered to have the 
directionality of the base character which it follows.

This will fix both the problems listed in this draft, as well as any 
related problem where a combining character is following a RandALCat 
character.


More information about the Idna-update mailing list