U+200C rule

Simon Josefsson simon at josefsson.org
Mon Mar 21 14:58:11 CET 2011


Patrik Fältström <patrik at frobbit.se> writes:

> On 20 mar 2011, at 18.47, Simon Josefsson wrote:
>
>> Patrik Fältström <patrik at frobbit.se> writes:
>> 
>>>> If not, what is the intended way to implemented RegExpMatch?
>>> 
>>> The expression try to say that you need around _each_ \u200C the following:
>>> 
>>>> One codepoint with either Joining_Type L or D
>>>> 
>>>> Zero or more codepoints with Joining_Type T
>>>> 
>>>> The \u200C
>>>> 
>>>> Zero or more codepoints with Joining_Type T
>>>> 
>>>> One codepoint with either Joining_Type R or D
>>> 
>>> The regexp does not take into account more than one \u200c in each string.
>> 
>> Thanks for clarification, I'll implement it this way and will add a
>> couple of test vectors for it.
>
> If you have suggestions on how to fix the rule, let me know.

I believe the easiest and most understandable would be to use text to
explain the rule.  I believe the current pseudo code is too informal to
be converted into machine code without going through a human for parsing
anyway.  Thus, using text when that is easier to understand by the
reader should be preferred.

/Simon


More information about the Idna-update mailing list