What to do with Gaulish ?

CE Whitehead cewcathar at hotmail.com
Fri Nov 17 01:02:35 CET 2006


Hi!
In reply to the following that Doug Ewell wrote (Mr. Ewell's comments that I 
am replying to are exerpted from  his message below and are separated from 
my reply by quotation marks):
"It's just as possible that an ordinary user looking for "French"
text, say for business or shopping, may not understand Old French
and Middle French, and will not want scholarly material in those
languages.  It is probably best to require the student to indicate
them explicitly."

Of course, an ordinary user looking for a French text might not want Middle 
or Old French.  Middle French would be readable probably to the ordinary 
user, though different from Modern French.  The documents would be literary 
anyway and should not interest the ordinary user and probably would not come 
up in a search for business and shopping;
they would come up in a search for French literature.
Only in a search for French literature I would think (hopefully, if the 
search engine works, if no one has put all kinds of fictitious stuff in the 
meta content information).
In addition, should the pages come up, the second part of the tag indicates 
the date of the language.

Also, in reply to his suggestions for declaring multiple languages:
">Language tags are defined as representing a single language (unless
>the subtag "mul" is used, which probably provides less information than any 
>alternative).  The application-specific structure that *uses* language tags 
>-- in this case, the "lang" attribute -- is the way to indicate multiple 
>languages."
If your pages are inserted into the body of a page created by the host's 
application, there is no place to list more than one language at a time; you 
can of course list different text processing languages in the various 
subsections delineated by html or xml or xhtml  (such as p for paragraph, 
div for division, span for still another section heading); but only one at a 
time.
There is no way to identify a single language as both fr, French, and frm, 
Middle French.

I'd like to say further that the option of having tags for say the European 
languages labelled:
12thc, 13thc, 14thc, 15thc, 16thc, 17thc
would help those who needed to clarify the exact date of a language or 
language variant.

I feel these would be useful tags!

Thanks.


Sincerely,
C. E. Whitehead
cewcathar at hotmail.com

>From: "Doug Ewell" <dewell at adelphia.net>
>To: <ietf-languages at iana.org>
>CC: <cewcathar at hotmail.com>
>Subject: Re: What to do with Gaulish ?
>Date: Thu, 16 Nov 2006 07:04:00 -0800
>MIME-Version: 1.0
>Received: from mta9.adelphia.net ([68.168.78.199]) by 
>bay0-mc4-f13.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.2444); Thu, 
>16 Nov 2006 07:04:01 -0800
>Received: from DGBP7M81 ([68.67.66.131]) by mta9.adelphia.net          
>(InterMail vM.6.01.05.02 201-2131-123-102-20050715) with SMTP          id 
><20061116150400.SPUQ10323.mta9.adelphia.net at DGBP7M81>;          Thu, 16 Nov 
>2006 10:04:00 -0500
>X-Message-Info: LsUYwwHHNt3BfpRQPx5jiP7KFDBk7V/abkTcRf14NXU=
>References: <20061116110003.741D82596C6 at eikenes.alvestrand.no>
>X-MSMail-Priority: Normal
>X-Mailer: Microsoft Outlook Express 6.00.2900.2869
>X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2900.2962
>Return-Path: dewell at adelphia.net
>X-OriginalArrivalTime: 16 Nov 2006 15:04:01.0828 (UTC) 
>FILETIME=[73559E40:01C70990]
>
>(Replying only to ietf-languages, since I'm not subscribed to the other 
>lists that received this mail.)
>
>CE Whitehead <cewcathar at hotmail dot com> wrote:
>
>>Hi, I am troubled by tags like frc, fro, and frm because I am wondering 
>>what happens when a person using a search engine asks for pages in French? 
>>  Will the frc, fro, frm pages turn up too?  It's quite possible that a 
>>person interested in French will be interested in moyen Francais/Middle 
>>French (frc) and in Old French (fro) if the search is for someone studying 
>>French.
>
>Neither the Language Subtag Registry nor, as far as I can tell, any of the 
>ISO 639 family of standards include this type of time-hierarchy 
>information.
>
>It's just as possible that an ordinary user looking for "French" text, say 
>for business or shopping, may not understand Old French and Middle French, 
>and will not want scholarly material in those languages.  It is probably 
>best to require the student to indicate them explicitly.
>
>>Also, as I noted, some of the 17th Century new world documents were in 
>>Middle French although you all have set the dates as 1400-1600 (those 
>>dates can vary a bit; you'd be surprised also at the amount of variation 
>>you can get in any given language at any given time before literacy was so 
>>widespread).
>
>This comment would be directed to the ISO 639 folks, since RFC 4646 (and 
>predecessors) and thus the W3C takes the language descriptions and dates 
>directly from ISO 639.
>
>It's well known that the dates aren't exact, and indeed cannot be, since in 
>almost all cases linguistic change occurs gradually rather than being 
>legislated into existence.
>
>>It's also conceivable that a person might want documents that are written 
>>in either a Creole of French and Standard French.
>>
>>One could of course list all of the languages related to a particular page 
>>using the meta content tags; for example for my "Moyen francais" document 
>>I could list:
>>lang=en, fr, frm
>
>Language tags are defined as representing a single language (unless the 
>subtag "mul" is used, which probably provides less information than any 
>alternative).  The application-specific structure that *uses* language tags 
>-- in this case, the "lang" attribute -- is the way to indicate multiple 
>languages.
>
>>Why not also have optional variant tags indicating the century in which a 
>>dialect/language was used, for example
>>
>>12c (12th century, 1100-1199 A.D.)
>>13c (13th century, 120001299 A.D.)
>>14c
>>15c
>>16c
>>17c
>>
>>and so forth.
>
>From a mechanical standpoint, these variants would need to adhere to the 
>RFC 4646 syntax for variant subtags: either 5 to 8 letters and digits, or 4 
>if the subtag begins with a digit.  You could propose "12cent", "13cent", 
>etc. and these would be syntactically acceptable.
>
>From a linguistic standpoint, you would need to convince the Language 
>Subtag Reviewer that such variants are justified, and not an 
>overspecification.  Not all languages changed on a tidy century-by-century 
>basis, and of course the base-10 dates "1100-1199" are just as arbitrary as 
>"1400-1600" mentioned above.
>
>>These become quite relevant for 17th century European languages which are 
>>'modern' sort of but sometimes vary quite a bit from the modern version of 
>>the language (I found this to be the case when dealing with 17th century 
>>French in a report coming from the U.S.; some of the features I noted in 
>>the 1683 report were reminiscent of Old French, many of Middle French, 
>>spellings were sometimes irregular and phonetic; it might be 
>>understandable to a speaker of Modern French but so might 16th century 
>>French which does get the Middle French tag; elsewhere, in some texts from 
>>France, 17th century French appears more like the modern variety; likewise 
>>Shakespeare's 16th-17th century English is modern, in fact, as I 
>>understand things, his use of English based on Scots dialect made Modern 
>>English what it is; but it does vary a bit from English used today).
>
>It would seem difficult (not to say "impossible") for a system of short 
>alphabetic codes to accurately reflect subtle historical nuances like this. 
>  It's understood that such nuances exist.
>
>>On the same issue, what is going to happen with Arabic, when you get the 
>>new subtags, will people still be able to use ar with a country code to 
>>indicate the language?  Or is the new subtag to be the only option?  I am 
>>not sure which should be the case myself as the dialects are quite 
>>different, though most written Arabic is not spoken but standard so these 
>>new codes should probably only apply for spoken materials or phonetic 
>>transcriptions for the most part.
>
>In RFC 4646bis, "ar" will continue to mean Arabic generally (not 
>necessarily restricted to "standard Arabic"), while primary-extended pairs 
>like "ar-arz" or "ar-abh" will reflect the dialects coded in ISO 639-3.
>
>>Of course, having a century tag would not solve everything for languages 
>>that vary over time:
>
>Certainly not; language is more complex than that.
>
>>On this note, I'd like to know how to apply for a variant (not a language) 
>>subtag, 17c, if I may do so.
>>Hope I may.
>
>See RFC 4646, section 3.5, "Registration Procedure for Subtags."  Make sure 
>you understand the syntactical restrictions, which are there for a reason 
>-- to allow the different types of subtags to be identified based on 
>structure and position within the tag.  Also bear in mind that not all 
>requests are automatically approved.
>
>http://www.rfc-editor.org/rfc/rfc4646.txt
>
>--
>Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
>http://users.adelphia.net/~dewell/
>http://www1.ietf.org/html.charters/ltru-charter.html
>http://www.alvestrand.no/mailman/listinfo/ietf-languages
>

_________________________________________________________________
Get free, personalized commercial-free online radio with MSN Radio powered 
by Pandora http://radio.msn.com/?icid=T002MSN03A07001



More information about the Ietf-languages mailing list