[Ltru] Re: Macrolanguages, countries & orthographies
debbie at ictmarketing.co.uk
Wed Feb 14 03:17:32 CET 2007
>> The principle is the same for any other language: do we presume that the
code means only the modern variant, or covers all historical variations? We
need to get an answer for that; without that answer, we can't know whether
to accept or reject historic variant proposals.
My personal opinion is that ISO 639-3 subtags cover the "whole language" as
described; all of the language, every part of the language, written and
spoken and... historical. Even when there is an ISO 639-3 historical subtag
that covers part of it.
My advice, accept urgent proposals for historic variants on the basis that
they will be deprecated when ISO 639-6 comes into being - assuming it is
incorporated within RFC4646bis or ter. Inform proposers of such variants
that ISO 639-6 is currently being designed and if the need is not urgent
delay until ISO 639-6 is published.
I think this group needs to make a decision wrt ISO 639-6.
From: Mark Davis [mailto:mark.davis at icu-project.org]
Sent: 14 February 2007 00:22
To: Lars Aronsson
Cc: ietf-languages at iana.org; LTRU Working Group
Subject: [Ltru] Re: Macrolanguages, countries & orthographies
Saying that it is not as important is, I agree, your prejudice. Importance
is in the eye of the beholder, and ISO 639-3 has 7,500 languages, which make
distinctions that to people concerned with Czech will be far less important
than the difference between old Czech and modern Czech.
Moreover, one cannot fixate on the exact example used. There are plenty of
others, because very few languages have "Old" variants in 639-3. The
principle is the same for any other language: do we presume that the code
means only the modern variant, or covers all historical variations? We need
to get an answer for that; without that answer, we can't know whether to
accept or reject historic variant proposals.
On 2/13/07, Lars Aronsson <lars at aronsson.se> wrote:
Mark Davis wrote:
> Assume that old Czech is as different from modern as fro is from fr.
But is this a real problem? How much total literature is written
and available in different variations of Czech? My prejudice says
that as a nation with a language and literature of its own, Czech
is about as young as Finnish, Norwegian or Serbian, i.e. 19th
century. Can you give any concrete examples when not having a
separate *code* for pre-renaissance Czech is a practical problem?
Linguists of course have *names* for Swedish of all ages, but I
see no real use for having ISO or the IETF specify language
*codes*. I could be wrong, but if so please enlighten and correct
me. Nobody is going to translate OpenOffice or Mozilla to the
language spoken by vikings (Old Norse) or the Swedish used during
the Lutheran reformation (called New Swedish, ironically).
Yes, there is now a branch of Wikipedia in Old English
( ang.wikipedia.org <http://ang.wikipedia.org> ), but that is a rare
exception. I don't expect
this to happen in other languages. Ang has now 744 articles,
compared to the 11,000 articles of the Latin Wikipedia.
I'm scanning old books, and I'm starting to see a practical
problem with different orthographies and spelling reforms, similar
to those addressed with the IETF defined codes for German de-1901
and de-1996. Analogous to these codes, we could perhaps find use
for sv-1801, sv-1889, sv-1906, da-1775, da-1892 and da-1948,
because we now have *significant amounts* of text online in each
of these language versions. But before 1775/1801 the orthography
of Swedish and Danish varies so heavily with each work, that it
becomes pretty much useless to try to identify more versions.
And before that time, there is also so small amounts of literature
available, that any automatic processing becomes insignificant.
Lars Aronsson (lars at aronsson.se <mailto:lars at aronsson.se> )
Aronsson Datateknik - http://aronsson.se
Ietf-languages mailing list
Ietf-languages at alvestrand.no <mailto:Ietf-languages at alvestrand.no>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Ietf-languages