[Ltru] 639 coding wrt historic varieties (was RE: Request for variant subtag fr 16th-c 17th-c Resubmitted!)

David Starner prosfilaes at gmail.com
Thu Jan 11 22:01:43 CET 2007


On 1/11/07, Mark Davis <mark.davis at icu-project.org> wrote:
> I'm inclined to agree with your inclination "to say that all the coded
> entities in 639 should be understood to be the modern variety unless
> explicitly indicated otherwise". (The boundaries for the modern languages
> will come up for the ietf languages group, for example, if someone tries to
> register a variant for 12th century Czech. We will need to know that that
> requires first a language subtag for "Old Czech (ca. 1000-1400)" to be
> encoded as a prefix for that, and route the applicant to the JAC.)

There's several problems I see. The chronological lines between
languages are as arbitrary as any, and there is no way to predict when
the committee would draw the line between modern and pre-modern Czech.
So a user _cannot_ know whether cs applies to their 15th century Czech
books or not without bringing it up to the committee.

Secondly, this breaks at least some current usage. For example, I
tagged "Primitive & mediaeval Japanese texts" (10th-12th century
Japanese) and "Ars grammaticae Iaponicae linguae" (17th century
Japanese) as including ja when entering them into the Distributed
Proofreaders database. When it comes to Project Gutenberg, it will be
entered as ja. The LoC seems to have entered the first as eng, despite
having one volume of ja-Latn, but I'm sure there's medieval Japanese
in a library database marked as ja somewhere.

Third, right now, the tagging for ancient languages is woefully
insufficient. Unless someone's willing to put a lot of work into
cataloging recorded ancient languages, a library like PG will end up
using the modern language tags for ancient languages instead of
fighting for dozens of new tags. To follow this path involves dozens
of new tags, which probably won't successfully be created piece-meal.

PS. I'll point out this has slightly absurd results when applied to
la. I think we can assume that la was designed to apply to Classical
Latin, and not (not just?) the modern variety.

PS. #2 I can't think of any way this affects tlh. Isn't everyone happy
about that?


More information about the Ietf-languages mailing list