Suppress-Script for Korean?
dewell at roadrunner.com
Wed Jul 25 08:36:57 CEST 2007
Randy Presuhn <randy underscore presuhn at mindspring dot com>
>> Are you suggesting that if a document is entirely in (say) hiragana I
>> shouldn't tag it ja-Hira because Hira is considered a subset of Japn
>> and Japn is to be suppressed?
> For a document longer than a few words to be purely "Hira" would be
> *very* artifical, and consequently I'd expect it to be marked as such,
> just as an extended document solely in Kanji or Katakana would also be
> quite artifical. On the other hand, let's say it's just a quote of a
> word or two which would normally be written in pure "Hira". In that
> case, it's not a "marked form", so I'd say that simply tagging it "ja"
> would normally be the appropriate thing to do.
Randy and I are in vigorous agreement on this issue. My position, as
stated earlier, is that "Kore" represents "the Korean writing system"
which typically contains a very large proportion of Hangul and a very
small amount of Hanja. According to this model, a text that happens to
be 100% Hangul could still be considered "Kore" if the essence of the
writing system is that Hanja are not explicitly avoided. I consider
this analogous to Addison's (and others') principle that a text can be
"Latn" even if contains a few Greek or Cyrillic letters, as long as the
essence of the writing system is Latin.
My counterexample is that a children's book that deliberately avoids all
Hanja would be a suitable example of 'Hang'.
> I think the real consideration is this: does adding the script subtag
> provide information that could not be reasonably inferred from "ja".
> "-Japn" would not provide useful information. "-Hira" marks the text
> as being quite out of the ordinary.
(Side note: many people have written "Japn', but the actual ISO 15924
code element and script subtag is 'Jpan'.)
I agree again. Script subtags, like all subtags, should be used if they
help identify the linguistic usage, and should be avoided if they don't.
Suppress-Script exists to help identify the cases where a script subtag
would be superfluous.
The question is really:
(a) whether most people who write "ko" alone generally mean "Hangul plus
Hanja," whether they generally mean "Hangul only," or whether the two
cases are sufficiently equal that neither can be presumed, and
(b) whether "Hangul plus Hanja" as a concept is applicable even for
examples than contain no Hanja.
Doug Ewell * Fullerton, California, USA * RFC 4645 * UTN #14
More information about the Ietf-languages