Swiss german, spoken

Wed Jun 15 15:37:09 CEST 2005

At 06:56 15/06/2005, Peter Constable wrote:
> > From: JFC (Jefsey) Morfin [mailto:jefsey at jefsey.com]
> > >Incorrect. Issues discussed on this list relate to registration of
>tags
> > >"for the identification of languages" -- that is, tags to be used as
> > >metadata elements to declare the linguistic properties of content in
> > >Internet and other protocols and applications. There is nothing
>stated
> > >anywhere that these tags necessarily apply only to text content.
> >
> > Except that to register a language you must provide printed references
>....
>
>The relevant field on the form is:
>"Reference to published description of the language (book or article)"

Agreed. This is why I say "gray area" and I say that my point of view seems 
to disagree with this list understanding. It is true that it is possible to 
do everything I need with RFC 3066. It is true that it would still be 
possible to do it with the Draft (x-tags). But the de facto accumulation of 
"possibilities" and attitudes makes it less and less _perceived_ as 
consensually correct.

The increased support of written forms (particular interest in scripts) 
should be accompanied by the same concerns for the other languages vectors. 
This is one of the reasons of the experimentation we engage. This 
experimentation will be carried in due respect of the conditions expressed 
in ICANN ICP-3 document, in adding some constraints we discovered in 
experimenting it, and will be part of the Draft I will present in 
Luxembourg. The target is to document that/how tags may covers the full 
scope of the language issues as you express it further on. This could be 
addressed today, this will call for more effort, wasted time and possible 
split. But at the same time experimentation will demonstrate.

>Note that the reference is to a *description* of the language, not
>necessarily a work in the language itself. Thus, one could provide
>references to descriptions of (say) American Sign Language, which is not
>commonly written.

OK. The problem is that many linguistic variations and even evolutions are 
nowadays supported by media and not documented as such. I will take an old 
example, out of the current debate so (I hope) neutral. Forty years ago "Le 
Cid" (one of the most famous play of Pierre Corneille, in alexandine lines) 
was rewritten in "Papaouette" a language of a _quarter_ of Algiers. Such an 
exercise is common, but this one was famous and quite funny (it turns out 
that plenty of words and expressions are self-understandable, so you easily 
understand 75% of the play). No printed version (or hand written and 
lost)). There is therefore no problem for someone to publish a multi 
version CD of the play by the Reference "Comédie Française" and add 
multiple spoken versions. In Papaouette and in other kind of similar 
"languages". . We all wrote our own "locale" version (I wrote thousands of 
funny lines for a College version. He will have to make a menu. It is 
likely that such a publication will rise an interest in the "Pied Noirs" 
(French people of North Africa) and in local communities. That several 
other publications, plays, etc , reviving languages which are privately 
still much in use with no more geographical roots. I wrote a few songs in 
Accademy language, I had to "translate" to obtain the agreement of the 
famous authors I used the music and of SACEM (I published a record of 
Academy songs). There is therefore a need for this true languages to be 
identified (Papaouette is a mix of French, Spanish, Arabic, Jewish, 
Italian, etc. words and grammatical constructs. Professional languages are 
sometimes very complex and rooting in several other languages).

Documented sources on Papaouette -  a part from press, etc. does not exist. 
No linguist published on it. If this happens in a lingually sophisticated 
country with a very structured support of the dominant language, I suppose 
it can happen in many other places and in many other Diasporas. We have a 
published dictionnary of the Academy Language but its main purpose is to 
document words, locutions and constructs entered in the French language and 
to document some history of words over 175 years.

This is why I disagree with "documenting". The documentation is that one 
person claim for a language name and that others puts a meaning on it. No 
one ever registered until recently the names England, Italy or Germany. Yet 
there are millions of people who used it for centuries. And when they were 
"registered" they were actually just "recorded".

>Several years ago on the IETF-languages list, there was some discussion
>of what kinds of materials could be referred to. I had just the opposite
>concern: someone might need a tag for a lesser-known language and not be
>able to provide references to a description of the language. There was
>consensus that references could be to a *description* of the language,
>or to a work *in* the language.

Yes, but today it should be a record, and that description should only be 
by the one registering. There should be no filtering by "experts". Experts 
should help every registering and then advise on their use.

I should be able to register "jfc" to support my Franglish.

> > I note your "eclare the linguistic properties of content" which is
> > someting I could agree with. But which is not exactly the wording of
> > the document you refer to.
>
>No, it's not the wording of the RFC, but I very much feel the
>appropriate characterization of "language tags" is that they primarily
>function to declare attributes.

The problem - where I oppose - is that declaration is neutral (someone 
declares a tag and describes the tag). Then anyone can freely use the 
resulting list for whatever he wants. For example to classify his library. 
If I want to put the French books under Russian when the author is Russian, 
I can. If I explain my rules it may even be understood by _my_ users having 
read my explanation.

The description of the page in using the tag is no more neutral: some 
criteria must be used to determine if this is really the language. This can 
be subjective (you say this is in English because I speak English and I 
wrote for English readers, but there are still many questions). This can be 
documented with rules establishing what an English text looks like and a 
filtering being made.

The real problem comes when RFC 3066 (and further variations) start using 
"defining". Again I know that English does not intend to be very precise in 
logic and much more precise in conveying a feeling, an understanding. But 
we are in standard texts. The single usage of defining opens the way to the 
language norming. The filtering is no more to know but to decide. The 
second problem is that in our current word we need that layer for machine 
processing. If you do not document the process you refer to, you create a 
de facto "default" process of reference. This process in the mind of the 
public will be the market dominant one. And you enter in many other problems.

>(This distinguishes them, in my mind,
>from locale identifiers, which primarily function as API parameters used
>to tailor culture-dependent processes.)

IMHO gray area. You are fully right if you come from "inside" one of the 
technical processes, like UNICODE. This progressively build-up towards a 
more comprehensive support of several things like API. Now, consider you 
come from the users side (my whole reasoning is based on a user-centric 
network architecture, so please differentiate what you call an "end user" 
and what I call a "user". You consider "clients" of an applications, I 
consider a funding architectural concept with every right and every need).

Users come by relational communities and the network is to serve these 
relations. Relational community network use protocols - person to person 
protocols are named "languages" in English (in French we will differentiate 
"langues" (languages) and "langages" (which include everything multimodal 
else). These protocols are dependent on community external factors (what I 
name "referents") and internal factor (what I name "contexts"). Depending 
on the kind of relation, historical situation, the matter, etc. referents 
and contexts may take the priority. This is why a name can span several 
languages: this is the ambition of the famous trade marks.

So, API are also transient, so are "locales". I

>  I'd include linguistic
>attributes, in the primary sense of that term, but also include
>attributes related to the written form -- script, orthography, spelling,
>transcription, transliteration -- in the case of textual content. But,
>not all content need be textual, the system should facilitate tagging of
>linguistic content regardless of the mode of expression.

Yes. Again we are in full agreement. This is just a question of degree. 
Coming from a textual world, coming from SIL implementing that textual 
world in real world, using Unicode character set, etc. you are more text 
oriented. But today do you think SIL would proceed the same way? They would 
use recorders everywhere, speech analyser, synthetic voice may be, etc. 
Obviously text gives a solid framework. But just to start with. It was OK 
with ISO 639-1 and -2. Probably much more complex with ISO 639-3. Quite out 
of most of context with ISO 639-6.

This is why it is urgent to have a strong stable framework to make sure 
that all the converging descriptions of attributes, which are themselves 
related to other converging attributes in many other areas, fall into a 
stable structured framework. The only one we have today is ISO 11179. I do 
not necessarily like all of it, and it is quite complex and unfinished. But 
I thing we are better off sharing the same problems as everyone else 
(including major Gov Administration all over the world, with budget to 
correct mistakes) than inventing our own and possibly leading to confusion 
or to major delays.

When this list approves a tag (since it wants to discuss them, not to 
advise them) it has absolutely no idea of what the implications can be ten 
years from now and where.

> > Due to the impact of ISO documents in the langtag registration process
>and
> > of their parallel evolution agreed by everyone (even if the nature of
>the
> > evolution may be different depending on the person) it is advisable to
> > read
> > ISO 639-1, -2, the drafts of -3, -4, -5, -6 you might find, ISO 15924
>and
> > ISO 3166. For those wanting to understand the possible future
>conflicts
> > concerning the registrations discussed here they should consult ISO
>11179
> > (scalability, updates, nature of the documented information, etc.).
>
>It certainly isn't a bad idea to be familiar with the 639, 15924 and
>3166 standards. For 639, there's no particular point going looking for
>parts 4, 5 or 6 at this time since there isn't a complete working draft
>of any of them, and there is no immediate plan to have any of them
>impinge on RFC 3066 or some successor thereof.

I accept that (with the restriction above) in the narrow point of view of 
registering "xxx" or "i-xxx" tags for RFC 3066. I disagree about the Draft 
and successors. ISO 639-3 will obey to ISO 639-4 rules (or you will delay 
ISO 639-3). A retrofit which can be acceptable at ISO 639-3 concept layer, 
may not be acceptable at IANA layer. I also want to point out that our 
experimentation - which takes 4, 5, 6 into account - shows that their work 
address currently well identified needs. What Karen describes shows in 
addition that she needs for her own standardisation process to build on 
them (this work will obviously be contingent to the final documents as they 
are published. But this is true for every standard which may have a further 
revision).

>ISO 11179 takes rather a deeper level of interest and commitment. It's a
>six-part compendium on metadata elements and registries and metamodels
>for metadata elements. The IANA registry for language tags which is the
>focus of this list has never been considered an implementation of this
>ISO standard, and knowledge of this ISO standard is not a prerequisite
>to making useful contributions to the work of this list. Familiarity
>with ISO 11179 certainly wouldn't get in the way of contributing to this
>list -- unless one begins to behave as though others on this list are or
>ought to be familiar with it as well.

Correct. But responsible participation to the debate of this list calls for 
participants to understand that their propositions (again I regret that way 
of understanding the role of this list) may have an impact on a large 
number of registries made collateral to the ISO tables they consider, due 
to ISO 11179. I do not want to tell my story again, but should I have 
better thought about the implications of our 1984 consensus which permitted 
RFC 920 and the whole naming, and have added the numeric addressing in the 
agreement we would all be IPv6 for ten years. It is behaving in a 
responsible manner to try to have a reasonable understanding the possible 
implications of what one does.

I submit that a reasonable understanding is to know that this exists and 
that this may have an impact, and banning these consideration a mistake. 
When you write a Draft, you are not necessarily an expert in Security. Yet 
the Security section helps you remembering that there are security aspects 
to what you specify. Your short description is a beginning, but people 
should also be aware that their own registry is qualified (even if it is 
not subject to) by ISO 11179 and tat the internationally understood terms 
relating to registries are defined there. That issues like versionning and 
update are documented there: disputing/mending ISO 3166 updates outside of 
ISO 111790 context is therefore a dead-end for the result to stay 
consistent with the tables, libraries, caches, locales as far as they are 
ISO consistent.

jfc