Katakana Middle Dot again (Was: tables-06b.txt: A.5, A.6, A.9)

Wil Tan wil at cloudregistry.net
Sat Jul 25 16:30:59 CEST 2009


On Sun, Jul 26, 2009 at 12:00 AM, Patrik Fältström<patrik at frobbit.se> wrote:
> On 25 jul 2009, at 15.43, Wil Tan wrote:
>
>>>  True;
>>>  if .not. Script(BeforeChar(cp)) .in.  (Han|Hiragana|Katakana) then
>>> False;
>>
>> Not just the character before, but there must be at least one
>> Han|Hiragana|Katakana character in one of the preceding characters
>> before the katakana middle dot. We might need additional constructs in
>> the pseudocode grammar for this. In pseudo-functional-style-python:
>>
>> # PosOfChar() returns the index of the candidate character within the
>> label
>> # CPat() returns the code point at the given index
>> if not any([Script(CPat(pos)) in (Han, Hiragana, Katakana) for pos in
>> range(0, PosOfChar())]) then False;
>
> What you say is that it is for example ok to have the following:
>
> ABCB
>
> Where Script(A) is Katakana, Script(B) is Latin and C is the middle dot?
>

Yes.

e.g. カラOK・スター.com is totally valid.

so is カラオケ・スター.com

OTOH, this goal of this rule (not saying whether we should do this) is
to prevent things like

www・ibm.com
www・ヤフー.com

But in my other message, I'm really leaning towards having simpler
pseudocode i.e. test for (Han|Hiragana|Katakana|LDH|U+3005..U+3007),
or even no pseudocode i.e. just "True" and leave the descriptive text
as a reminder to registry and application to be careful. The latter
would still be better than making it PVALID as an exception because it
would leave no room for explanatory text.

Hope I'm not causing more confusion and wasting time (sorry if I did!)

=wil


More information about the Idna-update mailing list