Katakana Middle Dot again (Was: tables-06b.txt: A.5, A.6, A.9)

Patrik Fältström patrik at frobbit.se
Sat Jul 25 16:00:44 CEST 2009


On 25 jul 2009, at 15.43, Wil Tan wrote:

>>  True;
>>  if .not. Script(BeforeChar(cp)) .in.  (Han|Hiragana|Katakana) then  
>> False;
>
> Not just the character before, but there must be at least one
> Han|Hiragana|Katakana character in one of the preceding characters
> before the katakana middle dot. We might need additional constructs in
> the pseudocode grammar for this. In pseudo-functional-style-python:
>
> # PosOfChar() returns the index of the candidate character within  
> the label
> # CPat() returns the code point at the given index
> if not any([Script(CPat(pos)) in (Han, Hiragana, Katakana) for pos in
> range(0, PosOfChar())]) then False;

What you say is that it is for example ok to have the following:

ABCB

Where Script(A) is Katakana, Script(B) is Latin and C is the middle dot?

>>  For each cp:
>>    if .not. (Script(cp) .in. (Han|Hiragana|Katakana) .or.
>>        cp in {U+002D,U+0030..U+0039,U+0061..U+007A,U+3005..U+3007})  
>> then
>> False;
>>
>
> We'll need to include the candidate character itself, yeah?
>
> For each cp:
> if .not. (Script(cp) .in. (Han|Hiragana|Katakana) .or.
>        cp in {U+002D,U+0030..U+0039,U+0061..U+007A,U+3005..U+3007,U 
> +30FB})
> then False;

No, because that the codepoint is in the label we already know, and if  
you include it it will always evaluate to True.

    paf

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://www.alvestrand.no/pipermail/idna-update/attachments/20090725/f54137c0/attachment.pgp 


More information about the Idna-update mailing list