Proposal to add "Kore' as Suppress-Script for 'ko'

Addison Phillips addison at yahoo-inc.com
Wed Jul 11 23:34:09 CEST 2007


CE Whitehead wrote:
>  > Add to this the fact that there is already lots of stuff tagged with just "ko",
>  > this seems to be precisely the kind of case "Suppress-Script" was intended for.
> 
> 
> That is a case for suppress-script for ko
> although in fact it is not our job to redo subtags simply because people do not 
> quite understand how to use these.

It is helpful to read and understand RFC 4646, which allows this sort of 
registration. It should not be viewed as "redoing" a subtag. It is 
adding informative fields to the registry. In this case, the change is 
prompted by the addition of a script code.

> 
> Do persons using the hanja chracters or the Latin script forms almost always tag 
> their content properly??  If so then I support the suppress-script of kore for 
> ko, but otherwise I do not.

No, of course they won't. Mistakes in tagging will be made.
> 
> 
> Also, how do people tag the language when kore encompasses a smattering of 
> hanja  (as ko-kore)? 
> Or are these characters tagged separately?  That is, what is the proper way to 
> tag the kore variant with a smattering of hanja characters??

"ko"

The real question you should be asking is: when does a script subtag add 
distinguishing information? The 'kore' and 'hang' subtags add no 
distinguishing information to a typical Korean document. The subtags 
'cyrl', 'brai', or 'latn' (when properly applied) *certainly* do. The 
measure of whether to use a subtag (script or otherwise) is *always* 
whether it adds distinguishing information. Suppress-Script is merely an 
informative field to assist people unfamiliar with a language or 
language tagging to ascertain this fact.

One might very well need to distinguish within a document (perhaps in 
parallel corpora demonstrating the phenomenon) between "ko-Kore" and 
"ko-Hang" (for example).

> 
> And what is the way kore script with a smattering of hanja characters is 
> commonly tagged (by most content authors)?

"ko"

That's why it's (proposed to be) suppressed!

The problem here is that 'Kore' is a synonym for a couple of scripts 
mixed together (mostly Hangul with a smattering of ideographs), whereas 
'Hang' (Hangul) is just the "pure" Hangul script. But as we've already 
noted, it isn't necessary for the text to be restricted to a certain 
character range in order for the script subtag to be redundant. Neither 
of these script subtags should be be used for most Korean documents.

Addison

-- 
Addison Phillips
Globalization Architect -- Yahoo! Inc.
Chair -- W3C Internationalization Core WG

Internationalization is an architecture.
It is not a feature.


More information about the Ietf-languages mailing list