Katakana Middle Dot again (Was: tables-06b.txt: A.5, A.6, A.9)

Sat Jul 25 15:43:22 CEST 2009

On Sat, Jul 25, 2009 at 10:54 PM, Patrik Fältström<patrik at frobbit.se> wrote:
> On 25 jul 2009, at 14.34, Wil Tan wrote:
>
>> I accidentally left out the U+3005..U+3007 that Yoneya-san proposed.
>> Therefore, #3 should be:
>>
>>  3. That the label contains only
>> (Han|Hiragana|Katakana|LDH|U+3005..U+3007) + katakana middle dot.
>>
>> It's important to note that having these constraints would rule out:
>
> What you say is that you want the following rules:
>

With the caveat that this will invalidate existing registrations and
prohibit some classes of Japanese labels that people may expect to be
able to use.

>  True;
>  if .not. Script(BeforeChar(cp)) .in.  (Han|Hiragana|Katakana) then False;

Not just the character before, but there must be at least one
Han|Hiragana|Katakana character in one of the preceding characters
before the katakana middle dot. We might need additional constructs in
the pseudocode grammar for this. In pseudo-functional-style-python:

# PosOfChar() returns the index of the candidate character within the label
# CPat() returns the code point at the given index
if not any([Script(CPat(pos)) in (Han, Hiragana, Katakana) for pos in
range(0, PosOfChar())]) then False;

>  For each cp:
>    if .not. (Script(cp) .in. (Han|Hiragana|Katakana) .or.
>        cp in {U+002D,U+0030..U+0039,U+0061..U+007A,U+3005..U+3007}) then
> False;
>

We'll need to include the candidate character itself, yeah?

 For each cp:
 if .not. (Script(cp) .in. (Han|Hiragana|Katakana) .or.
        cp in {U+002D,U+0030..U+0039,U+0061..U+007A,U+3005..U+3007,U+30FB})
then False;

=wil