Proposed records and registration forms for Japanese variants

Fri Sep 18 15:35:05 CEST 2009

Han Steenwijk <han dot steenwijk at unipd dot it> wrote:

> As the variant subtag "hepburn" is all about romanization, could its 
> Prefix field look like this:
>
> Prefix: ja, ja-Latn

No, but syntactically at least, it would be possible to have two Prefix 
fields:

Prefix: ja
Prefix: ja-Latn

> The tag "ja-hepburn" would be a shorthand equivalent to 
> "ja-Latn-hepburn". The shorthand equivalent only makes sense in 
> environments that are informed on the meaning of the subtag "hepburn". 
> In other environments the explicit tag "ja-Latn-hepburn" is to be 
> preferred.
>
> The same would hold for other subtags that specify the type of 
> romanization.

The problem is that this would create "recommended" forms that are 
intentional exact duplicates, something which is generally not 
encouraged in language tags.

Any usage of 'hepburn', or any other subtag for that matter, is expected 
to be "informed" as to the meaning of that subtag.  I guess you are 
saying that people who understand that the Hepburn variant is a 
romanization do not need the Latin script specified for them also.  But 
it was decided back when we registered a subtag for Pinyin that it was 
better to include the script subtag in the Prefix.

(The fact that we have already agreed to take the rule we set for 
Chinese romanizations and apply it to Japanese is why I felt it was 
appropriate to apply those principles to Korean as well, and not treat 
every case as a tabula rasa.)

There are already many examples of effectively duplicate tags that we 
cannot do anything about.  Icelandic is, for all practical purposes, 
spoken only in Iceland, so it makes little sense to use "is-IS" instead 
of just "is", but both forms are allowed and are effectively duplicates. 
There's not much we can do there.  But if "ja-hepburn" and 
"ja-Latn-hepburn" are known at the time of registration to have the same 
meaning, as they are, then we have the ability to encourage (by means of 
the Prefix field) only one of the combinations.

This is different from, say, the '1994' subtag registered for Resian and 
its variants.  In that case, "sl-rozaj-1994" is not an exact semantic 
duplicate for any finer-detailed tag such as "sl-rozaj-biske-1994"; the 
former means "any Resian, sub-dialect unspecified, written in 
Steenwijk's 1994 orthography."

Prefix fields are just suggestions and you can always write "ja-hepburn" 
and hope that matching engines get it right.  Of course, you can also 
write something inappropriate like "fr-hepburn", and then there should 
be no expectation that matching engines will know what you mean.  The 
point is that both of these tags are syntactically valid, even if one is 
logically inappropriate.

--
Doug Ewell  |  Thornton, Colorado, USA  |  http://www.ewellic.org
RFC 5645, 4645, UTN #14  |  ietf-languages @ http://is.gd/2kf0s