el-latn, ru-latn, and related possibilities

Mon Oct 3 07:34:53 CEST 2005

> From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-
> bounces at alvestrand.no] On Behalf Of John.Cowan

> > The question arises is as follows: just as de-de-1901 and all the
> similar
> > subtagging exists for German, might there be advantages in registrations
> > (or alternatively a mechanism which avoided the need for registrations)
> > which listed widely used transliterations into Latin?
> 
> We haven't yet settled whether (when we are fully in the RFC 3066bis
> regime) we should handle transliterations as simple variants (like
> historical orthographies, geographical and social dialects, and the
> like) or via the new extension machinery of RFC 3066bis, which is
> more work to set up but is more general-purpose.  In any case,
> someone would have to maintain a registry of transliterations,
> transcriptions, and orthographies: no one has taken on that job yet.

IMO, these should be variants and not extensions. Treating them as extensions would require a separate specification and entail that they would not be supported in protocols and specifications that reference RFC 3066bis unless specifically revised to reference the extension RFC as well -- which would be an incredible pain. So, for instance, while there's reasonable likelihood of expecting XML would be updated to reference RFC 3066bis, so that those tags could be used for xml:lang, it's far less likely that a extension RFC would be referenced by XML, meaning that transliterations could not be distinguished in XML lang.

I'm completely convinced that transliterated text in some language such as Russian can simply be treated as Russian-language data that is written in an alternate written form. For some purposes, tagging it as "ru" may be sufficient; for others, tagging it as "ru-Latn" may be needed, and for some purposes tagging it as (say) "ru-Latn-iso9r95" (assuming ISO 9:1995) may be appropriate. This nicely allows for useful degrees of specificity, and is completely adequate for indicating the distinguishing characteristics of the language and written form of the data. 

There's absolutely no reason I can see for complicating this either syntactically with an extension and its corresponding singleton, or procedurally by requiring a completely separate registration process to be set up. The only possible benefit of an extension would be the possibility of creating some means for generative creating of tags that allow for transliteration standards from some defined group of sources, such as ISO, but I think the potential benefits are limited and the cost high. 

My vote, therefore, would be to simply treat specific transliteration schemes via the variant subtag as defined in RFC 3066bis.

Peter Constable