Macrolanguages, countries & orthographies

Mark Davis mark.davis at icu-project.org
Wed Feb 14 01:22:00 CET 2007


Saying that it is not as important is, I agree, your prejudice. Importance
is in the eye of the beholder, and ISO 639-3 has 7,500 languages, which make
distinctions that to people concerned with Czech will be far less important
than the difference between old Czech and modern Czech.

Moreover, one cannot fixate on the exact example used. There are plenty of
others, because very few languages have "Old" variants in 639-3. The
principle is the same for any other language: do we presume that the code
means only the modern variant, or covers all historical variations? We need
to get an answer for that; without that answer, we can't know whether to
accept or reject historic variant proposals.

Mark

On 2/13/07, Lars Aronsson <lars at aronsson.se> wrote:
>
> Mark Davis wrote:
>
> > Assume that old Czech is as different from modern as fro is from fr.
>
> But is this a real problem?  How much total literature is written
> and available in different variations of Czech?  My prejudice says
> that as a nation with a language and literature of its own, Czech
> is about as young as Finnish, Norwegian or Serbian, i.e. 19th
> century.  Can you give any concrete examples when not having a
> separate *code* for pre-renaissance Czech is a practical problem?
>
> Linguists of course have *names* for Swedish of all ages, but I
> see no real use for having ISO or the IETF specify language
> *codes*.  I could be wrong, but if so please enlighten and correct
> me.  Nobody is going to translate OpenOffice or Mozilla to the
> language spoken by vikings (Old Norse) or the Swedish used during
> the Lutheran reformation (called New Swedish, ironically).
>
> Yes, there is now a branch of Wikipedia in Old English
> (ang.wikipedia.org), but that is a rare exception.  I don't expect
> this to happen in other languages.  Ang has now 744 articles,
> compared to the 11,000 articles of the Latin Wikipedia.
>
> I'm scanning old books, and I'm starting to see a practical
> problem with different orthographies and spelling reforms, similar
> to those addressed with the IETF defined codes for German de-1901
> and de-1996.  Analogous to these codes, we could perhaps find use
> for sv-1801, sv-1889, sv-1906, da-1775, da-1892 and da-1948,
> because we now have *significant amounts* of text online in each
> of these language versions. But before 1775/1801 the orthography
> of Swedish and Danish varies so heavily with each work, that it
> becomes pretty much useless to try to identify more versions.
> And before that time, there is also so small amounts of literature
> available, that any automatic processing becomes insignificant.
>
>
>
> --
>   Lars Aronsson (lars at aronsson.se)
>   Aronsson Datateknik - http://aronsson.se
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
>



-- 
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/ietf-languages/attachments/20070213/377059fc/attachment-0001.html


More information about the Ietf-languages mailing list