The limit of language codes

Lars Aronsson lars at aronsson.se
Sun Feb 18 20:56:57 CET 2007


David Starner wrote:

> What we have to do first? This is not a missionary group. My 
> primary goal is to create a set of language tags usable for 
> Project Gutenberg, for which also having them supported by XML 
> and other people is a great help.

Nice to meet you.  I'm here for the same reason, only Project 
Runeberg instead, http://runeberg.org/

> A large percentage of the books I personally do for Project 
> Gutenberg date back before 1700, whether in modern editions or 
> original facsimiles. That's what I'm most concerned about 
> tagging.

My current opinion, which might change any day, is that simple 
time-less language codes are enough for my current needs.  Even 
though the spelling in 16th century Swedish is quite different 
from today's spelling, the authors, ideas and topics are also very 
different.  In trying to understanding an old text, the spelling 
isn't necessarily the hardest part.  Trying to search or spell 
check that language based on "lang=sv" will often fail.  
Ultimately one could follow the example of German, where de-1901 
and de-1996 now denote the spelling before and after the spelling 
reform of 1996. It is quite easy to identify 3 different language 
variants of Swedish between 1801 and now (because of two major 
spelling reforms in 1889 and 1906), and the same number of Danish.  
But as soon as I start looking at Norwegian, there is a personal 
orthography per writer and decade.  And for Danish and Swedish 
this is true of the situation before 1800.  Trying to identify and 
isolate all such variants doesn't seem to pay off, this would only 
lead to the kind of en-GB-1611-Shakespeare-Stratford absurdities 
(or nb-NO-1870-Ibsen) described in Harald's posting, and so one 
might just as well call the languages by their current names and 
give up on variants.  We also have books where the dialogues are 
written in dialect, that defies any standard orthography, and 
books on highly specialized topics where the terms and phrases are 
not found in standard dictionaries.  sv-1883-chemistry, eh?

Before I'm going to need standardized subtags for Danish, Swedish 
and Norwegian, I first have to find a use for subtags, and then to 
find a need to exchange them in a standardized format.  Even if I 
declare a particular text to be in a certain variant of Swedish, 
neither users nor search engines care much about this.

But if anybody else can see a need for standardized subtags for 
these languages, I'd be interested in taking part in the 
discussion.

By the way, tomorrow is the 100th anniversary of a major spelling 
reform of Norwegian (bokmål, at the time called riksmål).  On 
February 19, 1907, the Norwegian government resolved to change the 
spelling in their official documents.  The rules for "nb-1907" are 
described in http://runeberg.org/rm1907/  However, very few other 
writers used this reform in every detail, and a new major reform 
was introduced in 1917.


-- 
  Lars Aronsson (lars at aronsson.se)
  Aronsson Datateknik - http://aronsson.se

  Project Runeberg - free Nordic literature - http://runeberg.org/


More information about the Ietf-languages mailing list