Mixing scripts (Re: Unicode versions (Re: Criteriaforexceptional characters))

Soobok Lee lsb at lsb.org
Tue Dec 26 11:30:51 CET 2006


On Tue, Dec 26, 2006 at 12:29:24PM +0900, Martin Duerst wrote:
> At 00:43 06/12/25, Soobok Lee wrote:
> 
> >You need not go so far.
> >Even Hangul Jamo vowel EU  look alike  CJK TC One.
> >Both look like long hyphen, and both are in KSC5601.
> >Even in this case, they should be allowed  in IDNA in principle, but 
> >had better be displayed in punycode form, since it is not safe.
> 
> My personal preference would be to disallow single jamo
> (or maybe even all jamo; they are not needed unless you
> want domain names for single jamos, or if you want domain
> names with historic hangul, both of which doesn't make
> much sense for me).

Some jamo consonant sequences of labels may have user demands,
for example, (KI-EOK)(NI-EUN)(DI-GEUD).com may make appeals like
(abc).com in ascii world.

but as for jamo vowels, I agree with you. It won't be so useful.
BTW, ML.com already accepted some vowel jamo only IDN.com.
( XN--QSD.COM == ( vowel A).com already registered. 
  http://xn--qsd.com/ seems  only for forwarding purpose.
)


> 
> 
> >As you pointed out,  KSC5601 includes Greek/Cyrillic characters 
> >but, as ***special character*** sections (not as main scripts)  and
> >so, it is true that greek+cyrillic mixture have users demand 
> >due to KSC5601 under my previous suggestion.
> >I think special characters in local charset should be excluded in
> >this context.
> 
> How is anybody going to decide which characters in which local
> charsets are 'special'? I haven't read KSC5601, but I'm assuming
> it doesn't mark characters as 'special' or 'normal'.

http://unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/KSC/OLD5601.TXT

(ROMAN NUMERIAL TEN,0x2539)  
(upper ALPHA, 0x2541)
....
(lower OMEGA, 0x2578)
(BOX DRAWING LIGITH HORIZONTAL, 0x2621)

Greek chars are  buried in special character sections as above.

(SUBSCRIPT FOUR, 0x297E)
(HIRAGANA LETTER SMALL A, 0x2a21)
(KATAKANA LETTER SMALL A, 0x2b21)
(CYRILLIC UPPER  A, 0x2c21)
(CYRILLIC LOWER YA, 0x2c71)
(HANGUL SYLLABLE KA, 0x3021)

Hiragana/KataKana/Cyrillic comes after "subscript 4".
we can see a big gap between cyrillic ya and hangul ka.
I wil inspect KSC5601 gov-issued document later time ( I should 
go the gov libaray to review that original standard doc).

But, I admit the distinction between special or normal (main)
may be _floating_. 

If russia and korea become more intimate in the future (????),
 and we often see cyrillic alphabets on the shop signboards,
 and then there may arise demands for hangul+cyrillic mixture and
 hangul+cyrillic IME  that can facilitate input of cyrillic  
 characters with hanguls.

Currently, hangul and CJK have input methods but other scripts
above share special character input  method for circled a  etc
(in Windows). 

The classifying criteria for special and normal  , as my own proposal,
may be "the ease of input in commonly available IME for the local charset".

So, my comments on "demands" for script mixture applies *only* to 
the present , and not extendable into the future.
We can't exclude Korea have new local character in the future
that include other scripts and easy IMEs for those new scripts .

Best Regards,

Soobok


More information about the Idna-update mailing list