Katakana Middle Dot again (Was: tables-06b.txt: A.5, A.6, A.9)
Patrik Fältström
patrik at frobbit.se
Sat Jul 25 16:00:44 CEST 2009
On 25 jul 2009, at 15.43, Wil Tan wrote:
>> True;
>> if .not. Script(BeforeChar(cp)) .in. (Han|Hiragana|Katakana) then
>> False;
>
> Not just the character before, but there must be at least one
> Han|Hiragana|Katakana character in one of the preceding characters
> before the katakana middle dot. We might need additional constructs in
> the pseudocode grammar for this. In pseudo-functional-style-python:
>
> # PosOfChar() returns the index of the candidate character within
> the label
> # CPat() returns the code point at the given index
> if not any([Script(CPat(pos)) in (Han, Hiragana, Katakana) for pos in
> range(0, PosOfChar())]) then False;
What you say is that it is for example ok to have the following:
ABCB
Where Script(A) is Katakana, Script(B) is Latin and C is the middle dot?
>> For each cp:
>> if .not. (Script(cp) .in. (Han|Hiragana|Katakana) .or.
>> cp in {U+002D,U+0030..U+0039,U+0061..U+007A,U+3005..U+3007})
>> then
>> False;
>>
>
> We'll need to include the candidate character itself, yeah?
>
> For each cp:
> if .not. (Script(cp) .in. (Han|Hiragana|Katakana) .or.
> cp in {U+002D,U+0030..U+0039,U+0061..U+007A,U+3005..U+3007,U
> +30FB})
> then False;
No, because that the codepoint is in the label we already know, and if
you include it it will always evaluate to True.
paf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://www.alvestrand.no/pipermail/idna-update/attachments/20090725/f54137c0/attachment.pgp
More information about the Idna-update
mailing list