LANGUAGE SUBTAG REGISTRATION FORM (R4): pinyin
mark at macchiato.com
Thu Sep 11 13:09:50 CEST 2008
I'm back online, and have read through a number of the conversations. There
are several factors in play here, and I make a stab at disentangling them.
We have two main types of subtags:
Some variant subtags have
a specific meaning, and only really makes sense with a small number of
en-US-valygirl : valley-girl US English
hy-arevela : Eastern Armenian
el-polyton : Polytonic Greek
valygirl has a specific meaning, and is closely associated with
certain prefixes. A tag like fr-valygirl would have no meaning. Note
that we could have as easily written en-valygirl, because the variant
is sufficiently narrow that the "US" is implicit. As a matter of fact,
we could even have written "und-valygirl", because even the "en" is
implicit in the code. However, practically, en-US-valygirl provides
better behavior in most implementations, and would be the recommended
The same considerations apply for hy-AM-Armn-arevela, el-Grek-GR-polyton,
and so on.
With this type of variant subtag, the meaning of the subtag has reasonable
independent meaning, and not closely tied to a particular prefix.
hy-eastern : my original request for Eastern Armenian (turned down)
fr-fonipa : French in IPA
fonipa is not tied to fr or en or de, but has independent meaning. Year tags
would be another case of this; although there is a prefix of de, the meaning
of fr-1901 would be clear. If we had defined "eastern" and "western", as I
proposed some time ago, they would be other cases. One could more
generatively interpret the subtags, because "de-eastern" would mean an
Eastern form of German.
We also have an established practice of having a progression from gross
granularity to fine granularity, such as with
sl : Slovenian
sl-rozaj : Resian variety of Slovenian
sl-rozaj-biske : San Giorgio dialect of the Resian variety of Slovenian
That is, we recognize that there are both broader and finer categories of
language that people legitimately need to distinguish. Note how this
interacts with the general/specific axis. In the above examples for
Slovenian, all are all speicific tags, and so even if we could actually
write sl-biske or even und-biske without ambiguity, the recommended prefix
is more explicit.
Note also that we have not had the practice of incorporating the year into a
variant except for very specific cases. We do not have bisk1593, nor do we
do el-mono1982 ("The *monotonic orthography* (μονός = single + τόνος =
accent) being the simplified spelling introduced in 1982 for modern Greek" -
wikipedia), nor el-pol200bc (with a putative date of 200BC for the
development of the diacritics). Nor did we have a year in "tarask", which
more specific (in the Description: "The subtag represents Branislau
orthography as published in "Bielaruski klasycny pravapis" by
JurasBuslakou, Vincuk Viacorka, Zmicier
Sanko, and Zmicier Sauka (Vilnia- Miensk 2005).")
Note that including the year is counter indicated where the intention is to
specific the broader variant, since it would indicate to users that only a
very narrow form (defined by some work in that year) is being referenced. We
do have cases where there is a very specific form that is being specified,
like 1694acad. This is when there is clear adherence to a particular work.
So, how does this apply to my recent (attempted) registrations?
zh-Latn-pinyin. This is to represent Mandarin Chinese written in the Hanyu
Pinyin form, as opposed to Wade-Giles.While "Latn" is implied by the variant
tag, for better implementation behavior it is included, so the prefix should
be "zh-Latn". This is just like how biske has "rozaj" in its prefix
This does not follow the pattern of fonipa, since it is a specific
romanization of a specific language. If we are to be consistent with the
pattern of denying "eastern", we should not broaden this tag to be more than
what is intended. The name "pinyin" would be the best name, since that is
the most recognizable term. I do not want the year 1954 in the subtag, nor
the year 1979, nor any other year, since those would be too restrictive.
However, if it makes the difference between being accepted or not I could go
with "hpinyin" or similar variant.
be-acade. This should not have the year in the variant subtag either, since
that would be too restrictive, and not represent the form that is intended
for registration. After all, as in the registration form (text originally
from Yuri), "The "academic" (normative, literary) form, existing in a
relatively unchanged form for 75 years". This tag is for the more general
category; those specific forms could be added later, using the "sl-rozaj-biske"
model if someone wants forms that are specific to a form defined by a work
in a given year: 1959, 1985, 2008, or whatever. But that is not the
intention for this subtag -- it is not year-specific.
I would have no objection to having the tag be "be-beakadem" or something
like that if people found that preferable to also incorporate "be" into the
variant subtag. After all, like "rozaj" or "biske", it is intended to be a
On Thu, Sep 11, 2008 at 3:46 AM, Doug Ewell <doug at ewellic.org> wrote:
> CE Whitehead <cewcathar at hotmail dot com> wrote:
> > Hi, John: I think the argument that I was responding to is that the
> > script subtag helps to identify the kind of orthography when the
> > variant subtag is not known. So I continue to consider the
> > suppress-script field is the best solution!
> Will someone please explain to me what good it would do to associate a
> Suppress-Script with a variant subtag if the tag consumer doesn't
> recognize the variant?
> Doug Ewell * Thornton, Colorado, USA * RFC 4645 * UTN #14
> http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Ietf-languages