The limit of language codes
Lars Aronsson
lars at aronsson.se
Sun Feb 18 20:56:57 CET 2007
David Starner wrote:
> What we have to do first? This is not a missionary group. My
> primary goal is to create a set of language tags usable for
> Project Gutenberg, for which also having them supported by XML
> and other people is a great help.
Nice to meet you. I'm here for the same reason, only Project
Runeberg instead, http://runeberg.org/
> A large percentage of the books I personally do for Project
> Gutenberg date back before 1700, whether in modern editions or
> original facsimiles. That's what I'm most concerned about
> tagging.
My current opinion, which might change any day, is that simple
time-less language codes are enough for my current needs. Even
though the spelling in 16th century Swedish is quite different
from today's spelling, the authors, ideas and topics are also very
different. In trying to understanding an old text, the spelling
isn't necessarily the hardest part. Trying to search or spell
check that language based on "lang=sv" will often fail.
Ultimately one could follow the example of German, where de-1901
and de-1996 now denote the spelling before and after the spelling
reform of 1996. It is quite easy to identify 3 different language
variants of Swedish between 1801 and now (because of two major
spelling reforms in 1889 and 1906), and the same number of Danish.
But as soon as I start looking at Norwegian, there is a personal
orthography per writer and decade. And for Danish and Swedish
this is true of the situation before 1800. Trying to identify and
isolate all such variants doesn't seem to pay off, this would only
lead to the kind of en-GB-1611-Shakespeare-Stratford absurdities
(or nb-NO-1870-Ibsen) described in Harald's posting, and so one
might just as well call the languages by their current names and
give up on variants. We also have books where the dialogues are
written in dialect, that defies any standard orthography, and
books on highly specialized topics where the terms and phrases are
not found in standard dictionaries. sv-1883-chemistry, eh?
Before I'm going to need standardized subtags for Danish, Swedish
and Norwegian, I first have to find a use for subtags, and then to
find a need to exchange them in a standardized format. Even if I
declare a particular text to be in a certain variant of Swedish,
neither users nor search engines care much about this.
But if anybody else can see a need for standardized subtags for
these languages, I'd be interested in taking part in the
discussion.
By the way, tomorrow is the 100th anniversary of a major spelling
reform of Norwegian (bokmål, at the time called riksmål). On
February 19, 1907, the Norwegian government resolved to change the
spelling in their official documents. The rules for "nb-1907" are
described in http://runeberg.org/rm1907/ However, very few other
writers used this reform in every detail, and a new major reform
was introduced in 1917.
--
Lars Aronsson (lars at aronsson.se)
Aronsson Datateknik - http://aronsson.se
Project Runeberg - free Nordic literature - http://runeberg.org/
More information about the Ietf-languages
mailing list