[Ltru] Re: Ietf-languages Digest, Vol 50, Issue 15

Mark Davis mark.davis at icu-project.org
Thu Feb 15 23:34:19 CET 2007


I'm inclined towards #2 also, for the reasons you cite. My primary concern,
however, is to get a definitive statement from the RA as to which of the
policies is true. That, for me, is far more important than which policy is
actually used.

Mark

On 2/15/07, Peter Constable <petercon at microsoft.com> wrote:
>
>  Adopting 1 would mean adopting generally across all of ISO 639-3: all
> entries of individual-language scope encompass corresponding historic
> varieties. But then, note that historic varieties are relevant only in cases
> of languages with a long literary tradition that is preserved. For instance,
> there may have been an old Naskapi that is distinct from the modern
> descendent, but there never have been and never will be any records in this
> putative language, so there is zero need for an identifier that encompasses
> it.
>
>
>
> (Btw, please note: we do **not** code reconstructed protolanguages. ISO
> 639-3 is explicit about that. So please don't anybody suggest we'd be coding
> proto-Naskapi.)
>
>
>
> So really we're just talking about some limited set of cases with a
> literary tradition.
>
>
>
> Note that we're also only talking about cases in which languages were
> well-enough developed to maintain a single identify over several hundred
> years. That's what distinguishes a "historic" language from an "extinct"
> language. For instance, there are historic documents in a pre-Columbian
> Mixtec variety, but that language identity is not preserved by one specific
> modern Mixtec variety. And I reject the notion that pre-Columbian Mixtec
> together with all the modern Mixtec varieties is a macrolanguage unless
> someone makes the case that there's a user scenario in which it is
> appropriate to treat all those varieties as one language.
>
>
>
> So the number of relevant cases is fairly constrained. I don't know just
> how many there would be, but it's going to be a small fraction of all modern
> languages for which this is relevant.
>
>
>
> I have a concern with 1 that it would detract from interoperability, for
> the kinds of reasons Anthony mentions. There is a very large amount of usage
> in which "eng" is intended to mean specifically modern English, and a very
> large amount of usage in which "ces" is intended to mean specifically modern
> Czech. I don't see who it would help to decide that these IDs encompass Old
> English and Old Czech respectively: the average modern-language user isn't
> likely going to be cataloguing content in Old English and Old Czech, and
> they certainly aren't going to be helped by having queries return records in
> the historic varieties. As for the specialist, they certainly don't want to
> catalog content as all "eng" and "ces", as Anthony has made clear. The only
> scenario in which maybe someone is helped is when the specialist wants a
> query to return records for all historic varieties. I don't see why they
> can't use a Boolean operator for that, but even if there was enough need for
> a single ID, I wouldn't be inclined to use "eng' and "ces" for that purpose:
> that would be helping the 0.01% scenario at the detriment of 99.99% of
> users.
>
>
>
> Thus, I'm inclined towards 2. There is certainly willingness in general on
> the part of the ISO 639 JAC to code historic languages, so I have no doubt
> that IDs for things like Old Czech etc. would be provided so long as the
> need is clear and there's a sense that the historic boundaries deemed
> appropriate by philologists, research librarians, etc. are appropriate.
>
>
>
>
>
>
>
> Peter
>
>
>
>
>   ------------------------------
>
> *From:* Mark Davis [mailto:mark.davis at icu-project.org]
> *Sent:* Thursday, February 15, 2007 8:55 AM
> *To:* Anthony Aristar
> *Cc:* LTRU Working Group; ietf-languages at alvestrand.no
> *Subject:* [Ltru] Re: Ietf-languages Digest, Vol 50, Issue 15
>
>
>
> Your quotation below omits the true author, and may leave the impression
> that I wrote a number of paragraphs that I do not agree with and did not
> write. I only wrote "Assume that old Czech ..." -- someone else wrote the
> "But is this a real problem...."
>
> > Mark Davis wrote:
> >
> > > Assume that old Czech is as different from modern as fro is from fr.
> >
> > But is this a real problem?  How much total literature is written
> ...
>
> That being said, there are two models that ISO could be using.
>
>    1. *Overlapping. *'eng' means any English, modern or historic. 'ang'
>    means specifically Old English, a subset of 'eng'. 'ces' means any Czech.
>    There is no tag specifically for Old Czech.
>
>
>     1. so I could tag Beowulf with 'ang' or 'eng', but Shakespeare,
>       Austen, and Robin Williams only with 'eng'.
>       2. Smil Flaška z Pardubic and Václav Havel are both tagged
>       with 'ces'.
>       3. Requests for BCP 47 variant tags for Shakespearean English
>       (en-SHAKESPR) or old Czech (cs-OLDCZECH) would be legitimate.
>       4. A request for a variant tag for only modern English
>       (en-MODENGL), thus excluding Old English, would be legitimate.
>
>
>    1. *Disjoint. *'eng' means only modern English, 'ang' means Old
>    English, 'ces' means only modern Czech. There is no tag at all (currently)
>    for Old Czech.
>
>
>     1. so I could tag Beowulf with 'ang' only.
>       2. and there is no valid current code for tagging for Smil
>       Flaška z Pardubic
>       3. A request for BCP 47 variant tags for Shakespearean English
>       (en-SHAKESPR) would be legitimate
>       4. A request for a registered old Czech language tag
>       (oldczech) would be legitimate. (However "primary languages are strongly
>       RECOMMENDED for registration with ISO 639, and proposals rejected by ISO
>       639/RA will be closely scrutinized before they are registered with IANA." )
>
> I don't think they are using model number one, but we need to find out.
>
> Mark
>
> On 2/15/07, *Anthony Aristar* < aristar at linguistlist.org> wrote:
>
> With all due respect, this seems like a very odd discussion from my
> perspective  as a linguistics professor.  The discussion seems to
> presuppose that all that matters is whether Microsoft is going to one
> day produce a version of Word in Middle High German or Old English, or
> how many texts exist in a language.
>
> But the ISO 639 codes are used for much more than this.  In particular,
> they are used to ensure interoperability, allowing material of the same
> linguistic nature to be found in searches, and to be compared using the
> linguistic ontologies that are now being developed.  If I am a scholar
> searching for texts in Old English (or Old High German, for that
> matter) and everyone has been cavalier enough to code such material
> with eng and deu, what the search engines return will be utterly
> useless to me.  I am going to be flooded with such a quantity of
> material in Modern English and Modern German that searching through it
> will be essentially impossible.
>
> So if you really believe that it doesn't matter if you code English
> material as eng, whatever its period, what you're really saying is that
> you don't really care about interoperability, and that you don't really
> care about scholarship.
>
>                 **************************************
> Anthony Aristar, Director, Institute for Language Information & Technology
>                    Professor of Linguistics
> Moderator, LINGUIST               Principal Investigator, EMELD Project
> Linguistics Program
> Dept. of English                  aristar at linguistlist.org
> Eastern Michigan University            2000 Huron River Dr, Suite 104
> Ypsilanti, MI 48197
> U.S.A.
>
> URL: http://linguistlist.org/aristar/
>                 **************************************
>
> > Mark Davis wrote:
> >
> > > Assume that old Czech is as different from modern as fro is from fr.
> >
> > But is this a real problem?  How much total literature is written
> > and available in different variations of Czech?  My prejudice says
> > that as a nation with a language and literature of its own, Czech
> > is about as young as Finnish, Norwegian or Serbian, i.e. 19th
> > century.  Can you give any concrete examples when not having a
> > separate *code* for pre-renaissance Czech is a practical problem?
> >
> > Linguists of course have *names* for Swedish of all ages, but I
> > see no real use for having ISO or the IETF specify language
> > *codes*.  I could be wrong, but if so please enlighten and correct
> > me.  Nobody is going to translate OpenOffice or Mozilla to the
> > language spoken by vikings (Old Norse) or the Swedish used during
> > the Lutheran reformation (called New Swedish, ironically).
> >
> > Yes, there is now a branch of Wikipedia in Old English
> > ( ang.wikipedia.org), but that is a rare exception.  I don't expect
> > this to happen in other languages.  Ang has now 744 articles,
> > compared to the 11,000 articles of the Latin Wikipedia.
>
>
>
>
>
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
>
>
>
>
> --
> Mark
>
> _______________________________________________
> Ltru mailing list
> Ltru at ietf.org
> https://www1.ietf.org/mailman/listinfo/ltru
>
>


-- 
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/ietf-languages/attachments/20070215/05a45992/attachment-0001.html


More information about the Ietf-languages mailing list