Macrolanguages, countries & orthographies

Tue Feb 13 23:30:39 CET 2007

Mark Davis wrote:

> Assume that old Czech is as different from modern as fro is from fr.

But is this a real problem?  How much total literature is written 
and available in different variations of Czech?  My prejudice says 
that as a nation with a language and literature of its own, Czech 
is about as young as Finnish, Norwegian or Serbian, i.e. 19th 
century.  Can you give any concrete examples when not having a 
separate *code* for pre-renaissance Czech is a practical problem?

Linguists of course have *names* for Swedish of all ages, but I 
see no real use for having ISO or the IETF specify language 
*codes*.  I could be wrong, but if so please enlighten and correct 
me.  Nobody is going to translate OpenOffice or Mozilla to the 
language spoken by vikings (Old Norse) or the Swedish used during 
the Lutheran reformation (called New Swedish, ironically).

Yes, there is now a branch of Wikipedia in Old English 
(ang.wikipedia.org), but that is a rare exception.  I don't expect 
this to happen in other languages.  Ang has now 744 articles, 
compared to the 11,000 articles of the Latin Wikipedia.

I'm scanning old books, and I'm starting to see a practical 
problem with different orthographies and spelling reforms, similar 
to those addressed with the IETF defined codes for German de-1901 
and de-1996.  Analogous to these codes, we could perhaps find use 
for sv-1801, sv-1889, sv-1906, da-1775, da-1892 and da-1948, 
because we now have *significant amounts* of text online in each 
of these language versions. But before 1775/1801 the orthography 
of Swedish and Danish varies so heavily with each work, that it 
becomes pretty much useless to try to identify more versions.  
And before that time, there is also so small amounts of literature 
available, that any automatic processing becomes insignificant.

-- 
  Lars Aronsson (lars at aronsson.se)
  Aronsson Datateknik - http://aronsson.se