Swiss german, spoken
JFC (Jefsey) Morfin
jefsey at jefsey.com
Wed Jun 15 15:37:09 CEST 2005
At 06:56 15/06/2005, Peter Constable wrote:
> > From: JFC (Jefsey) Morfin [mailto:jefsey at jefsey.com]
> > >Incorrect. Issues discussed on this list relate to registration of
> > >"for the identification of languages" -- that is, tags to be used as
> > >metadata elements to declare the linguistic properties of content in
> > >Internet and other protocols and applications. There is nothing
> > >anywhere that these tags necessarily apply only to text content.
> > Except that to register a language you must provide printed references
>The relevant field on the form is:
>"Reference to published description of the language (book or article)"
Agreed. This is why I say "gray area" and I say that my point of view seems
to disagree with this list understanding. It is true that it is possible to
do everything I need with RFC 3066. It is true that it would still be
possible to do it with the Draft (x-tags). But the de facto accumulation of
"possibilities" and attitudes makes it less and less _perceived_ as
The increased support of written forms (particular interest in scripts)
should be accompanied by the same concerns for the other languages vectors.
This is one of the reasons of the experimentation we engage. This
experimentation will be carried in due respect of the conditions expressed
in ICANN ICP-3 document, in adding some constraints we discovered in
experimenting it, and will be part of the Draft I will present in
Luxembourg. The target is to document that/how tags may covers the full
scope of the language issues as you express it further on. This could be
addressed today, this will call for more effort, wasted time and possible
split. But at the same time experimentation will demonstrate.
>Note that the reference is to a *description* of the language, not
>necessarily a work in the language itself. Thus, one could provide
>references to descriptions of (say) American Sign Language, which is not
OK. The problem is that many linguistic variations and even evolutions are
nowadays supported by media and not documented as such. I will take an old
example, out of the current debate so (I hope) neutral. Forty years ago "Le
Cid" (one of the most famous play of Pierre Corneille, in alexandine lines)
was rewritten in "Papaouette" a language of a _quarter_ of Algiers. Such an
exercise is common, but this one was famous and quite funny (it turns out
that plenty of words and expressions are self-understandable, so you easily
understand 75% of the play). No printed version (or hand written and
lost)). There is therefore no problem for someone to publish a multi
version CD of the play by the Reference "Comédie Française" and add
multiple spoken versions. In Papaouette and in other kind of similar
"languages". . We all wrote our own "locale" version (I wrote thousands of
funny lines for a College version. He will have to make a menu. It is
likely that such a publication will rise an interest in the "Pied Noirs"
(French people of North Africa) and in local communities. That several
other publications, plays, etc , reviving languages which are privately
still much in use with no more geographical roots. I wrote a few songs in
Accademy language, I had to "translate" to obtain the agreement of the
famous authors I used the music and of SACEM (I published a record of
Academy songs). There is therefore a need for this true languages to be
identified (Papaouette is a mix of French, Spanish, Arabic, Jewish,
Italian, etc. words and grammatical constructs. Professional languages are
sometimes very complex and rooting in several other languages).
Documented sources on Papaouette - a part from press, etc. does not exist.
No linguist published on it. If this happens in a lingually sophisticated
country with a very structured support of the dominant language, I suppose
it can happen in many other places and in many other Diasporas. We have a
published dictionnary of the Academy Language but its main purpose is to
document words, locutions and constructs entered in the French language and
to document some history of words over 175 years.
This is why I disagree with "documenting". The documentation is that one
person claim for a language name and that others puts a meaning on it. No
one ever registered until recently the names England, Italy or Germany. Yet
there are millions of people who used it for centuries. And when they were
"registered" they were actually just "recorded".
>Several years ago on the IETF-languages list, there was some discussion
>of what kinds of materials could be referred to. I had just the opposite
>concern: someone might need a tag for a lesser-known language and not be
>able to provide references to a description of the language. There was
>consensus that references could be to a *description* of the language,
>or to a work *in* the language.
Yes, but today it should be a record, and that description should only be
by the one registering. There should be no filtering by "experts". Experts
should help every registering and then advise on their use.
I should be able to register "jfc" to support my Franglish.
> > I note your "eclare the linguistic properties of content" which is
> > someting I could agree with. But which is not exactly the wording of
> > the document you refer to.
>No, it's not the wording of the RFC, but I very much feel the
>appropriate characterization of "language tags" is that they primarily
>function to declare attributes.
The problem - where I oppose - is that declaration is neutral (someone
declares a tag and describes the tag). Then anyone can freely use the
resulting list for whatever he wants. For example to classify his library.
If I want to put the French books under Russian when the author is Russian,
I can. If I explain my rules it may even be understood by _my_ users having
read my explanation.
The description of the page in using the tag is no more neutral: some
criteria must be used to determine if this is really the language. This can
be subjective (you say this is in English because I speak English and I
wrote for English readers, but there are still many questions). This can be
documented with rules establishing what an English text looks like and a
filtering being made.
The real problem comes when RFC 3066 (and further variations) start using
"defining". Again I know that English does not intend to be very precise in
logic and much more precise in conveying a feeling, an understanding. But
we are in standard texts. The single usage of defining opens the way to the
language norming. The filtering is no more to know but to decide. The
second problem is that in our current word we need that layer for machine
processing. If you do not document the process you refer to, you create a
de facto "default" process of reference. This process in the mind of the
public will be the market dominant one. And you enter in many other problems.
>(This distinguishes them, in my mind,
>from locale identifiers, which primarily function as API parameters used
>to tailor culture-dependent processes.)
IMHO gray area. You are fully right if you come from "inside" one of the
technical processes, like UNICODE. This progressively build-up towards a
more comprehensive support of several things like API. Now, consider you
come from the users side (my whole reasoning is based on a user-centric
network architecture, so please differentiate what you call an "end user"
and what I call a "user". You consider "clients" of an applications, I
consider a funding architectural concept with every right and every need).
Users come by relational communities and the network is to serve these
relations. Relational community network use protocols - person to person
protocols are named "languages" in English (in French we will differentiate
"langues" (languages) and "langages" (which include everything multimodal
else). These protocols are dependent on community external factors (what I
name "referents") and internal factor (what I name "contexts"). Depending
on the kind of relation, historical situation, the matter, etc. referents
and contexts may take the priority. This is why a name can span several
languages: this is the ambition of the famous trade marks.
So, API are also transient, so are "locales". I
> I'd include linguistic
>attributes, in the primary sense of that term, but also include
>attributes related to the written form -- script, orthography, spelling,
>transcription, transliteration -- in the case of textual content. But,
>not all content need be textual, the system should facilitate tagging of
>linguistic content regardless of the mode of expression.
Yes. Again we are in full agreement. This is just a question of degree.
Coming from a textual world, coming from SIL implementing that textual
world in real world, using Unicode character set, etc. you are more text
oriented. But today do you think SIL would proceed the same way? They would
use recorders everywhere, speech analyser, synthetic voice may be, etc.
Obviously text gives a solid framework. But just to start with. It was OK
with ISO 639-1 and -2. Probably much more complex with ISO 639-3. Quite out
of most of context with ISO 639-6.
This is why it is urgent to have a strong stable framework to make sure
that all the converging descriptions of attributes, which are themselves
related to other converging attributes in many other areas, fall into a
stable structured framework. The only one we have today is ISO 11179. I do
not necessarily like all of it, and it is quite complex and unfinished. But
I thing we are better off sharing the same problems as everyone else
(including major Gov Administration all over the world, with budget to
correct mistakes) than inventing our own and possibly leading to confusion
or to major delays.
When this list approves a tag (since it wants to discuss them, not to
advise them) it has absolutely no idea of what the implications can be ten
years from now and where.
> > Due to the impact of ISO documents in the langtag registration process
> > of their parallel evolution agreed by everyone (even if the nature of
> > evolution may be different depending on the person) it is advisable to
> > read
> > ISO 639-1, -2, the drafts of -3, -4, -5, -6 you might find, ISO 15924
> > ISO 3166. For those wanting to understand the possible future
> > concerning the registrations discussed here they should consult ISO
> > (scalability, updates, nature of the documented information, etc.).
>It certainly isn't a bad idea to be familiar with the 639, 15924 and
>3166 standards. For 639, there's no particular point going looking for
>parts 4, 5 or 6 at this time since there isn't a complete working draft
>of any of them, and there is no immediate plan to have any of them
>impinge on RFC 3066 or some successor thereof.
I accept that (with the restriction above) in the narrow point of view of
registering "xxx" or "i-xxx" tags for RFC 3066. I disagree about the Draft
and successors. ISO 639-3 will obey to ISO 639-4 rules (or you will delay
ISO 639-3). A retrofit which can be acceptable at ISO 639-3 concept layer,
may not be acceptable at IANA layer. I also want to point out that our
experimentation - which takes 4, 5, 6 into account - shows that their work
address currently well identified needs. What Karen describes shows in
addition that she needs for her own standardisation process to build on
them (this work will obviously be contingent to the final documents as they
are published. But this is true for every standard which may have a further
>ISO 11179 takes rather a deeper level of interest and commitment. It's a
>six-part compendium on metadata elements and registries and metamodels
>for metadata elements. The IANA registry for language tags which is the
>focus of this list has never been considered an implementation of this
>ISO standard, and knowledge of this ISO standard is not a prerequisite
>to making useful contributions to the work of this list. Familiarity
>with ISO 11179 certainly wouldn't get in the way of contributing to this
>list -- unless one begins to behave as though others on this list are or
>ought to be familiar with it as well.
Correct. But responsible participation to the debate of this list calls for
participants to understand that their propositions (again I regret that way
of understanding the role of this list) may have an impact on a large
number of registries made collateral to the ISO tables they consider, due
to ISO 11179. I do not want to tell my story again, but should I have
better thought about the implications of our 1984 consensus which permitted
RFC 920 and the whole naming, and have added the numeric addressing in the
agreement we would all be IPv6 for ten years. It is behaving in a
responsible manner to try to have a reasonable understanding the possible
implications of what one does.
I submit that a reasonable understanding is to know that this exists and
that this may have an impact, and banning these consideration a mistake.
When you write a Draft, you are not necessarily an expert in Security. Yet
the Security section helps you remembering that there are security aspects
to what you specify. Your short description is a beginning, but people
should also be aware that their own registry is qualified (even if it is
not subject to) by ISO 11179 and tat the internationally understood terms
relating to registries are defined there. That issues like versionning and
update are documented there: disputing/mending ISO 3166 updates outside of
ISO 111790 context is therefore a dead-end for the result to stay
consistent with the tables, libraries, caches, locales as far as they are
More information about the Ietf-languages