Proposal to add "Kore' as Suppress-Script for 'ko'
addison at yahoo-inc.com
Wed Jul 11 23:34:09 CEST 2007
CE Whitehead wrote:
> > Add to this the fact that there is already lots of stuff tagged with just "ko",
> > this seems to be precisely the kind of case "Suppress-Script" was intended for.
> That is a case for suppress-script for ko
> although in fact it is not our job to redo subtags simply because people do not
> quite understand how to use these.
It is helpful to read and understand RFC 4646, which allows this sort of
registration. It should not be viewed as "redoing" a subtag. It is
adding informative fields to the registry. In this case, the change is
prompted by the addition of a script code.
> Do persons using the hanja chracters or the Latin script forms almost always tag
> their content properly?? If so then I support the suppress-script of kore for
> ko, but otherwise I do not.
No, of course they won't. Mistakes in tagging will be made.
> Also, how do people tag the language when kore encompasses a smattering of
> hanja (as ko-kore)?
> Or are these characters tagged separately? That is, what is the proper way to
> tag the kore variant with a smattering of hanja characters??
The real question you should be asking is: when does a script subtag add
distinguishing information? The 'kore' and 'hang' subtags add no
distinguishing information to a typical Korean document. The subtags
'cyrl', 'brai', or 'latn' (when properly applied) *certainly* do. The
measure of whether to use a subtag (script or otherwise) is *always*
whether it adds distinguishing information. Suppress-Script is merely an
informative field to assist people unfamiliar with a language or
language tagging to ascertain this fact.
One might very well need to distinguish within a document (perhaps in
parallel corpora demonstrating the phenomenon) between "ko-Kore" and
"ko-Hang" (for example).
> And what is the way kore script with a smattering of hanja characters is
> commonly tagged (by most content authors)?
That's why it's (proposed to be) suppressed!
The problem here is that 'Kore' is a synonym for a couple of scripts
mixed together (mostly Hangul with a smattering of ideographs), whereas
'Hang' (Hangul) is just the "pure" Hangul script. But as we've already
noted, it isn't necessary for the text to be restricted to a certain
character range in order for the script subtag to be redundant. Neither
of these script subtags should be be used for most Korean documents.
Globalization Architect -- Yahoo! Inc.
Chair -- W3C Internationalization Core WG
Internationalization is an architecture.
It is not a feature.
More information about the Ietf-languages