Proposed records and registration forms for Japanese variants
doug at ewellic.org
Fri Sep 18 15:35:05 CEST 2009
Han Steenwijk <han dot steenwijk at unipd dot it> wrote:
> As the variant subtag "hepburn" is all about romanization, could its
> Prefix field look like this:
> Prefix: ja, ja-Latn
No, but syntactically at least, it would be possible to have two Prefix
> The tag "ja-hepburn" would be a shorthand equivalent to
> "ja-Latn-hepburn". The shorthand equivalent only makes sense in
> environments that are informed on the meaning of the subtag "hepburn".
> In other environments the explicit tag "ja-Latn-hepburn" is to be
> The same would hold for other subtags that specify the type of
The problem is that this would create "recommended" forms that are
intentional exact duplicates, something which is generally not
encouraged in language tags.
Any usage of 'hepburn', or any other subtag for that matter, is expected
to be "informed" as to the meaning of that subtag. I guess you are
saying that people who understand that the Hepburn variant is a
romanization do not need the Latin script specified for them also. But
it was decided back when we registered a subtag for Pinyin that it was
better to include the script subtag in the Prefix.
(The fact that we have already agreed to take the rule we set for
Chinese romanizations and apply it to Japanese is why I felt it was
appropriate to apply those principles to Korean as well, and not treat
every case as a tabula rasa.)
There are already many examples of effectively duplicate tags that we
cannot do anything about. Icelandic is, for all practical purposes,
spoken only in Iceland, so it makes little sense to use "is-IS" instead
of just "is", but both forms are allowed and are effectively duplicates.
There's not much we can do there. But if "ja-hepburn" and
"ja-Latn-hepburn" are known at the time of registration to have the same
meaning, as they are, then we have the ability to encourage (by means of
the Prefix field) only one of the combinations.
This is different from, say, the '1994' subtag registered for Resian and
its variants. In that case, "sl-rozaj-1994" is not an exact semantic
duplicate for any finer-detailed tag such as "sl-rozaj-biske-1994"; the
former means "any Resian, sub-dialect unspecified, written in
Steenwijk's 1994 orthography."
Prefix fields are just suggestions and you can always write "ja-hepburn"
and hope that matching engines get it right. Of course, you can also
write something inappropriate like "fr-hepburn", and then there should
be no expectation that matching engines will know what you mean. The
point is that both of these tags are syntactically valid, even if one is
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ http://is.gd/2kf0s
More information about the Ietf-languages