Mixing scripts (Re: Unicode versions (Re: Criteria
forexceptional characters))
Soobok Lee
lsb at lsb.org
Sun Dec 24 16:43:02 CET 2006
On Sun, Dec 24, 2006 at 10:29:36PM +0900, Martin Duerst wrote:
> At 21:39 06/12/21, Soobok Lee wrote:
>
> >Local character sets might provide some clues.
>
> Clues, yes, but not much more.
Yes.
>
> >They often contain
> >multiple scripts in single local charset in order to serve the need
> >of everyday language life of local language communities.
> >
> >I think this statement is very reasonable:
> > "If it is possible to "localize" an IDN label in any single local charset,
> > the label should be allowed, however many scripts it spans across.
> > Even for some of those, UA can display in punycode form."
I am saying from the context of "local users' demand", replying to
Mr Markham who says "If we are certain we can forsee all possible needs
for script mixing now". What i try to point is "there may be proper
demand for mixed-script labels from end users if their local charset
allows such mixture".
"should be allowed" does not mean "is safe". It may not be safe and so,
I added the last condition above: "can display in punycode form".
>
> See more below for why this is a bad idea.
>
> >Labels of Simplified Han Ideo + Hangul Syllables cannot be
> >typed in or displayed in either of KSC5601(Korea) and GB2312(China).
> >So they should be disallowed somewhere between IDNA,UA and registries.
>
> The quoted statement above only says what should be allowed,
> so I don't see how it follows that combinations of simplified
> Han and Hangul should be disallowed. And there are quite a few
> simplified Han that are indistinguishable from traditional Han
> (and use just one codepoint),
You need not go so far.
Even Hangul Jamo vowel EU look alike CJK TC One.
Both look like long hyphen, and both are in KSC5601.
Even in this case, they should be allowed in IDNA in principle, but
had better be displayed in punycode form, since it is not safe.
But, most CJK SC cannot be represented in KSC5601 and Hangul
cannnot be in GB2312. So there may be NO demand for
such mixture currently.
>
>
> >Greek local charset(iso-8859-7) does not contain any cyrillic char,
> >Cyrillic local charset(iso-8859-5) does not contain any greek char.
>
> Please do your homework and have another close look at your local
> charset, KSC 5601.
>
> Similar to JIS X 0208 and GB 2312, it contains not only (full-width
> copies of ASCII) Latin, but also Greek and Cyrillic. Greek is
> handy in math and physics, and Russia is close to all three
> countries, and a few small alphabets didn't really take up
> too much space besides the large number of Hanzi/Kanji/Hanja/Hangul.
As you pointed out, KSC5601 includes Greek/Cyrillic characters
but, as ***special character*** sections (not as main scripts) and
so, it is true that greek+cyrillic mixture have users demand
due to KSC5601 under my previous suggestion.
I think special characters in local charset should be excluded in
this context.
So, I modified my suggestion:
"If it is possible to localize an IDN label in any
single local charset which form the label as main scripts, not
as special characters,
the label should be regarded to have potential user demands and
should be allowed,
however many scripts it spans across.
Even for some of those, UA can display in punycode form if their
native forms of display is not safe."
And Verisign adopted similar local charset based registration filtering
for IDN.com around 1999/2000/2001 (confirmed), but I don't know when
Verisign lifted up such sanction. Now, they accept directly UTF8 string
as input for IDN.com.
>
> So the idea of saying "if these appear in the same local charset,
> they must be safe" is a very dangerous one. None of these charsets
> have been tested with something as exposed to serious criminals
> as domain names, and none of these charsets has been designed
> with spoofing issues anywhere in mind.
You might read my sentence in security context. It is clear that
local charset based filtering is NOT sufficient for anti-spoofing
purpose. So I agree with you about this issue.
You may remember that I had requested repeatedly labels like
"p(cyrllic a)ypal" should be prohibited in old IDN WG around 2001/2002.
Best regards,
Soobok
>
> Regards, Martin.
>
>
>
> #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
> #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst at it.aoyama.ac.jp
More information about the Idna-update
mailing list