Katakana Middle Dot again (Was: tables-06b.txt: A.5, A.6, A.9)

Patrik Fältström patrik at frobbit.se
Sat Jul 25 14:54:21 CEST 2009


On 25 jul 2009, at 14.34, Wil Tan wrote:

> I accidentally left out the U+3005..U+3007 that Yoneya-san proposed.
> Therefore, #3 should be:
>
>  3. That the label contains only
> (Han|Hiragana|Katakana|LDH|U+3005..U+3007) + katakana middle dot.
>
> It's important to note that having these constraints would rule out:

What you say is that you want the following rules:

   True;
   if .not. Script(BeforeChar(cp)) .in.  (Han|Hiragana|Katakana) then  
False;
   For each cp:
     if .not. (Script(cp) .in. (Han|Hiragana|Katakana) .or.
         cp in {U+002D,U+0030..U+0039,U+0061..U+007A,U+3005..U+3007})  
then False;

    Patrik

> a. planting the katakana middle dot in an all-ASCII label (a
> legitimate use-case, but one that Yoneya-san was willing to live
> without)
>
> b. katakana middle dot used to concatenate two strings, the first of
> which is all-ASCII (arguably more common than above, so disallowing
> this may well be unnecessarily restrictive, but is meant to mitigate
> phishing concerns)
>
> c. katakana middle dot at the beginning of a label (I don't know how
> common is this.)
>
> I don't pretend to know this well, so it'd be great if Yoneya-san and
> others who are familiar with this could weigh in.
>
>
>>> However, it makes the rule considerably more complex and
>>> because of this I was thinking more of leaving this to the
>>> application, which may have more contextual information (such
>>> as user's locale, the TLD, etc.) to take appropriate steps to
>>> protect the user.
>>
>> It is a question, again, of how to draw the line.  In some
>> sense, the way that IDNA works makes "in the IDNA protocol" and
>> "in the application" different versions of the same thing -- all
>> of IDNA occurs "in the application" although API design, etc.,
>> may change perceptions of that.   The argument about  how
>> normative mapping should be has its mirror image here.
>> Personally, I can live with "general prohibition on
>> registration; recommend that lookup-side applications be
>> extra-careful with this stuff".  That is more or less what
>> CONTEXTO is about and would be consistent with what I think you
>> are suggesting above (the text in Protocol may need tuning;
>> suggested text welcome).     Somewhat more would also work for
>> me if we can fairly clearly justify it.
>>
>> If applications start drawing lines differently on what should
>> have been registered, we get the most inconsistent behavior
>> possible.  That is why Protocol contains language requiring that
>> anything that is Lookup-Valid must be looked up, even if one
>> decides to warn the user first.
>>
>
> Thanks, this is for me a useful answer to the meta question of "where
> to draw the line". Still, it is a difficult balancing act to juggle
> between having rules that are simple enough to implement and yet tight
> enough to prevent confusion with a view to allow legitimate usage.
>
> In order to frame my head to make sense of the recent contextual rules
> discussions, I'm trying to picture the "guidelines" for deciding how
> to craft the rules. Is the following reasonable?
>
> 1. Overview: describe the expected contexts in which the subject
> character is allowed to appear.
> 2. Rule set: if the contexts are simple and narrow enough to capture,
> they can be expressed in the pseudocode. For other more complex ones,
> a simple check may suffice leaving the registry to place additional
> constraints on the subject character. It may not have to capture
> everything in the overview.
>
> This seems to be consistent with recommendations given elsewhere by
> Mark and others.
>
> Applying those "guidelines" to the katakana middle dot, the overview  
> would be:
>
>  This character is used in Japanese orthography to concatenate strings
>  containing characters in the Hiragana, Katakana and Han scripts,
>  or in any of the following sets: [a-z0-9\-], U+3005..U+3009.
>
> and the rule set would probably just be "True", because the rule won't
> be simple, and certainly will not be narrow so trying to capture it in
> pseudocode would be pointless. However, having the rule is better than
> not having any at all because it does flag the character to the
> registry that it needs policy around it, as well as tell the
> application to be extra careful about it.
>
>> Be careful, however, about any assumptions of actions based  on
>> TLDs.   The presence of DNAME and the lack of any "give me back
>> the canonical/primary tree" function in the DNS makes that one
>> very fragile even if there were no other issues.
>>
>
> Point taken.
>
> Thanks.
>
> =wil
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://www.alvestrand.no/pipermail/idna-update/attachments/20090725/57dac872/attachment.pgp 


More information about the Idna-update mailing list