Generic variants and Armenian dialects (long)
Doug Ewell
dewell at adelphia.net
Sun Sep 3 09:54:53 CEST 2006
This has become quite an interesting and complex thread. I'm going to
try to break it down into sub-topics and see if that makes anything
clearer, or makes any decisions easier.
1. Armenian dialects
Mark has identified two main dialects of Armenian, Western and Eastern,
which need to be tagged differently for localization purposes. The
existence of these two dialects appears to be well attested; the
Ethnologue entry for Armenian lists more than 30 dialects but ultimately
narrows its discussion to Western and Eastern. The two are also
discussed in Wikipedia, which says:
"The two modern literary dialects, Western (originally associated with
writers in the Ottoman Empire) and Eastern (originally associated with
writers in the Russian Empire), removed almost all of their Turkish
lexical influences in the 20th century, primarily following the genocide
of the Armenians in Anatolia by the Turks in 1915–1920."
and later:
"Armenian can be subdivided in two major dialectal blocks and those
blocks into individual dialects, though many of the Western Armenian
dialects have died due to the effects of the Armenian Genocide. In
addition, neither dialect is completely homogeneous: any dialect can be
subdivided into several subdialects. While Western and Eastern Armenian
are often described as different dialects of the same language, some
subdialects are not readily mutually intelligible. It is true, however,
that a fluent speaker of two greatly varying subdialects who are exposed
to the other dialect over even a short period of time will be able to
understand the other with relative ease."
This thread from a group called the Armenian Club Forum shows Armenian
speakers discussing the dialectical differences in terms of
Arevelahyeren (Eastern) and Arevmtahyeren (Western):
http://forum.armenianclub.com/showthread.php?t=1632&page=2
It seems clear that the distinction is real, and even though it is
possible to break down the dialects further, that does not prevent us
from creating variant subtags to identify these two major dialects while
reserving the right to provide finer distinctions in the future if
necessary.
It was stated that the Eastern/Western distinction is really a matter of
usage in Armenia proper vs. the diaspora. Since the latter group is not
tied to one particular country or region, use of ordinary region subtags
("hy-AM" vs. "hy-somewhere else") doesn't seem sufficient. It certainly
appears to be more complex than "Armenians in France and California."
In any case, even if this is just an “Armenia vs. diaspora” distinction,
there should still be some way to tag it if it linguistically justified,
which would appear to be true if a basic word like “please” is different
between the two (Wikipedia says Eastern uses խնդրեմ (khntrem) while
Western uses յաճիս (hadjis), and other sites list other differences).
2. Generic variants
The proposals for variants "Western" and "Eastern" came with comments
strongly implying that they could be used with other languages besides
Armenian, if those languages have a "Western" and "Eastern" dialect.
One possible danger is that someone will decide to start using them to
mark ordinary regional distinctions, instead of true dialects or other
linguistic differences. For example, it would not make sense to create
the tags "de-DE-eastern" and "de-DE-western" to distinguish German used
in the former DDR from that used in pre-unification West Germany, unless
commonly accepted linguistic varieties called "Western German" and
"Eastern German" had evolved as a result of the division (which AFAIK
they have not).
I get the feeling that at least a small part of the motivation for
proposing these is as a test case to see how variants with multiple
prefixes will fly. I understand this curiosity -- there's a part of me
that wants to see at least one extension RFC, to see what form they
would take and how the extension registry would be constructed -- but it
shouldn't really figure into the present proposals. So far, there is
only justification to use such subtags for Armenian.
In fact, Section 3.5 of RFC 3066bis implies rather strongly that a
single variant should NOT have two or more overloaded meanings,
rendering much of this "multiple prefixes" discussion moot:
“Requests to add a prefix to a variant subtag that imply a different
semantic meaning will probably be rejected. For example, a request to
add the prefix "de" to the subtag 'nedis' so that the tag "de-nedis"
represented some German dialect would be rejected. The 'nedis' subtag
represents a particular Slovenian dialect and the additional
registration would change the semantic meaning assigned to the subtag.
A separate subtag SHOULD be proposed instead.”
IIRC, the motivation to allow multiple prefixes was to establish the
rules for using two or more variants together. For example, it would be
senseless to write "sl-nedis-rozaj", because those two variants imply
different, mutually exclusive dialects. But if there were a variant
"splat" whose meaning was orthogonal to "nedis" and "rozaj", then it
would be appropriate to allow, say, "sl-nedis-splat". This would be
achieved by listing "sl-nedis" and "sl-rozaj", as well as "sl", as valid
prefixes for variant "splat". This mechanism was NOT intended to
encourage using the same variant for two or more different languages.
3. Names for the two proposed subtags
Mark has proposed that the Description fields for these two subtags be
"western" and "eastern" respectively. Ignoring for the moment the
question of overloading these variants for other languages, the premise
is that this is how the dialects are best known.
Michael made a counter-suggestion of "arevemda" for Western Armenian and
"arevela" for Eastern Armenian. (Frank may have a point that "arevmda"
or "arevmta", without the second 'e', is better Armenian.) It seems to
me that a major reason for suggesting these alternative names is to
prevent the subtags from being reused with other languages. These
appear to be simply derived from the Armenian words for "west" and
"east".
If the restriction against overloading a single variant for different
languages (Section 3.5, above) is honored, these two variants should
only be used for Armenian regardless of whether they are called
"western" or "arevemda" or "poiuytre". Obviously the last is
undesirable, and intended for effect; the question should be whether the
first (English) is more clear for potential users of the subtag than the
second (Armenian). I suggest asking actual speakers of Armenian. I
concede that the words for "west" and "east" are quite similar in
Armenian, but then that is true for a great many languages.
My personal preference is for "arevmda" and "arevela" (assuming that
Frank is right about the superfluous 'e'), on the basis that it will
discourage inappropriate usage of the subtags while still providing
meaningful strings. I greatly dislike "hywest" and "hyeast" (or
"hyewest" and "hyeeast") since they attempt to discourage inappropriate
usage by introducing significant visual clutter and redundancy to the
tag. We do not require users to write "sl-slnedis" or "de-de1996", and
we will not require "en-enboont" in the future.
4. Variants with no prefix, or used with the wrong prefix
Mark brought up what he called the "prefix bug": a variant created
without a prefix could never have a prefix added, because doing so would
restrict (not "broaden") the set of allowed prefixes. Whether this is
strictly true or not (John claimed it is not), I agree with Addison that
no variant should, in fact, ever be registered without a prefix; it
strongly encourages inappropriate usage. It's hard to envision a
variant that would be suitable for all languages (concepts like
"casual," "business," "legal," "sardonic," and "paternalistic" add
little or no value to language tags as people tend to use them, and
don't even exist in all languages). This should be much more clear in
RFC 3066ter, and yes, I know the LTRU list is the right place to fight
that battle.
John mentioned that "en-1901" would be an invalid tag. This was of
special interest to me since I have written a validating parser (part of
my tag-generating program which will be freely available as soon as the
RFCs are published). The way I read RFC 3066bis, the answer is
inconclusive. Section 2.2.9 says a validating parser must “check that
the [variant sub]tag must match at least one prefix,” which implies that
the tag is not valid if it does not. But Sections 2.2.5 and 3.1 speak
only in terms of variants being “not suitable” or “inappropriate” with
certain prefixes. So to me, the normative aspects of this are not
clear. My parser, which identifies tags as valid ("green light") or
invalid ("red light"), also has a third, “yellow light” status for tags
that are technically valid but ill-advised, such as "en-1901". This
also covers cases like using a deprecated subtag ("iw") or explicitly
specifying a Suppress-Script ("fr-Latn").
Addison asked:
“Also: what happens if we have "tlh-western" and a new subdialect
"fooish" is registered. Do we do "tlh-western-fooish" or "tlh-fooish"?”
That depends on how the prefixes are defined. As I wrote above under
"Generic variants," with Slovenian it was clear that “nedis” and “rozaj”
were inappropriate together, so each was defined with only “sl” as its
prefix. This is specified in Section 2.2.5. (My parser flags
“sl-nedis-rozaj” with a yellow light, on the basis of “rozaj” having an
inappropriate prefix “sl-nedis”, although combinations like
“sl-Tibt-rozaj” or “sl-JM-rozaj” are fine.)
5. Comments
At the risk of beating this to death: I believe the Comments field
should contain enough information to allow users to select the
appropriate subtag(s) for their tagging needs, and understand why they
are appropriate, IN CONJUNCTION WITH THE RFC. I don't believe it's
necessary to add a tutorial on how variants are to be used -- that
information is available in the RFC -- and especially not one that
contradicts the RFC.
I also would prefer not to see Registry entries burdened with a Comments
field that explains the obvious:
Type: variant
Subtag: western
Description: Western
Prefix: hy
Comments: Prefix ‘hy’, Western Armenian
In the above case, the Comments field adds no information that was not
evident from the rest of the entry. If a variant is given two or more
prefixes that represent different languages (which should never happen;
see my comments about Section 3.5 above), making the usage potentially
confusing, the comments can be added for all languages at that time.
--
Doug Ewell
Fullerton, California, USA
http://users.adelphia.net/~dewell/
Editor, draft-ietf-ltru-initial
More information about the Ietf-languages
mailing list