Language Variant subtags for Sanskrit

Thu Jul 15 03:50:52 CEST 2010

Caoimhín Ó Donnaíle <caoimhin at smo dot uhi dot ac dot uk> wrote:

> The difficulty I could see with that is that there might be quite a 
> lot of discussion and clarification needed as to what "classical" 
> meant in the case of certain languages.  Which could make for a very 
> long Comments field attached to the subtag.  Or alternatively, lots 
> and lots of Comments fields, since I see that there can be more than 
> one of these,

Lots and lots of Comments fields would be a really clear indication that 
the language varieties in question are too diverse to be covered by a 
single subtag.

> although the comments seem to attach to the subtag itself rather than 
> to particular prefix-subtag combinations.

Seems so because it is so:
"Comments's field-body contains additional information about the subtag, 
as deemed appropriate for understanding the registry and implementing 
language tags using the subtag or tag."

("tag" in this case refers to whole tags in the "redundant" category, 
not tags constructed from subtags in the modern way.)

> It looks to me (not sure whether I am right) as if the variant subtags 
> are in a bit of a limbo between being "atomic" and not being "atomic". 
> The fact that the Description and Comments and Added fields attach to 
> the subtag rather than to a particular prefix-subtag combination kind 
> of implies that they generally ought to be "atomic" (distinct for each 
> prefix).

The "semantic meaning" (wording taken from RFC 5646, Section 3.5) needs 
to be the same for all usage contexts.  'baku1926' works with multiple 
prefixes because it means the same thing for all of those prefixes.

Mark Davis 🍪 <mark at macchiato dot com> wrote:

> For example, fr-*CH* means a variant of French that uses, say, 
> huitante. On the other hand, de-*CH* means a variant of German that 
> uses "Schloss" instead of "Schlo?". The situation we have is that CH 
> has a constant meaning (Switzerland), but in combination with other 
> tags, has different implications.

Just yesterday I wrote, admittedly not in very large letters:

<<
Region subtags don't follow this principle perfectly: the relationship
between "en" and "en-CU" probably isn't the same as that between "es"
and "es-CU".  But region subtags were established long before the BCP 47
project (as such) got underway, and are already known to paint with too
wide a brush at some times and too narrow a brush at other times.
Variants are our invention, and we ought to follow our own principles
and intentions with regard to them.
>>

Back to Mark:

> There is a second facet to this, which is that "de-CH" can be confused 
> with a completely different language, "gsw".

This is an attribute of the inherent fuzziness of using country codes to 
identify languages and varieties.  (It's still an indispensable system, 
but it does have its flaws.)

> Let's look at "classical". Like "CH", when used with different base 
> languages it has different implications. Like X and Y above, if 
> "classical Sanskrit" is really a different language, then it should 
> get a different code.

I agree completely, and Peter is raising an important point here.

> Prefix: sa
> Prefix: fr
>
> Description: when used with "sa", refers to the version codified by
> Panini, also called Italian Sanskrit.
>
> Description: when used with "fr", refers to ...
>
> Comment: This denotes an intelligible variant of a language, not a 
> different language. For example, what is called "Classical Tibetan" is 
> properly represented by the separate language "xct"; it is not 
> correctly represented by "bo-classical".

I hope it's obvious to everyone that no record in the Registry should 
ever, ever look like this.  This is completely needless complexity, far 
worse than the minor loss of reading ease caused by registering, oh, I 
don't know, 'classa' instead of 'classic'.

--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s