U+200C rule
Simon Josefsson
simon at josefsson.org
Mon Mar 21 14:58:11 CET 2011
Patrik Fältström <patrik at frobbit.se> writes:
> On 20 mar 2011, at 18.47, Simon Josefsson wrote:
>
>> Patrik Fältström <patrik at frobbit.se> writes:
>>
>>>> If not, what is the intended way to implemented RegExpMatch?
>>>
>>> The expression try to say that you need around _each_ \u200C the following:
>>>
>>>> One codepoint with either Joining_Type L or D
>>>>
>>>> Zero or more codepoints with Joining_Type T
>>>>
>>>> The \u200C
>>>>
>>>> Zero or more codepoints with Joining_Type T
>>>>
>>>> One codepoint with either Joining_Type R or D
>>>
>>> The regexp does not take into account more than one \u200c in each string.
>>
>> Thanks for clarification, I'll implement it this way and will add a
>> couple of test vectors for it.
>
> If you have suggestions on how to fix the rule, let me know.
I believe the easiest and most understandable would be to use text to
explain the rule. I believe the current pseudo code is too informal to
be converted into machine code without going through a human for parsing
anyway. Thus, using text when that is easier to understand by the
reader should be preferred.
/Simon
More information about the Idna-update
mailing list