Message-Id: <6.2.1.2.2.20050706232617.05468ae0@mail.afrac.org>
Date: Thu, 07 Jul 2005 03:36:04 +0200
To: "Dylan N. Pierce" <dylanpierce@megared.net.mx>, ltru@ietf.org
From: r&d afrac <rd@afrac.org>
Subject: Re: [Ltru] Private Use Tags
In-Reply-To: <42CC0B62.9030802@megared.net.mx>
References: <42CB03D4.20801@megared.net.mx>
	<6.2.1.2.2.20050706001224.05117b90@mail.afrac.org>
	<42CC0B62.9030802@megared.net.mx>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
Cc: 
Precedence: list
Sender: ltru-bounces@lists.ietf.org
Errors-To: ltru-bounces@lists.ietf.org

Dear Dylan,
thank you for your response. I think we are in agreement on many things. 
But - you are a programmer - you know that from a concept to a program 
there must be an analysis and a development. Also, there is a added 
complexity which is that we are the IETF and not Open Source or Unicode. 
This means that we have to deal with networks, what means asynchronous 
interactions in real time with unknown final agents. There are simple rules 
from experience which help through this complexity which have been 
summarized by the present IETF Chair in RFC 1958. It is really worth 
reading and try to think that way and if possible improve it. Obviously you 
add your experience. I see you are following this.

There are three rules which are basic for me: (a) except the rule which 
says that, everything can change (b) kiss, keep it simple, stable, stupid, 
etc. (c) scalability which means that you must be able to apply the theory 
of that rule everywhere it can apply and it will work, a correlative one is 
that you should not solve two things which may become similar in two 
different ways. In a nusthell this means open consistency: open, everything 
is possible, consistency, one single idea should apply because sometime, 
somewhere you will have a conflict. This is basically what you say everywhere.

There is however a difference between what you say about programing and 
what I say about architecture: just think a step further, because here we 
are normating. So you _are_ to ask "why?" and not "are you sure", but "if 
we want that, we all have to do it that way for greatest efficiency": you 
aree like the client of your client. You build the envirnonment he will use 
to ask you to develop an application.

At 18:48 06/07/2005, Dylan N. Pierce wrote:
>In response to jfc (forgive me if I don't yet understand how to ensure 
>that my e-mail appears in the thread below the appropriate post; I freely 
>admit this is my first time participating in a procedure such as this).
>
>Whenever I read a standard, for better or for worse, the question I am 
>asking myself isn't, "What does this mean?" or "What purpose does this 
>serve?" I am always asking myself, "How do I write a program that does 
>this?" This is why RFC 3066, for all that it's a BCP, simply is 
>inadequate; I am interested enough in this working group to come here and 
>express a solid support for the current direction because having a 
>language tag which is parsable according to constructable rules greatly 
>reduces the amount of work any programmer has to do when developing for 
>compliance.

The problem specific to this group was decscribed by Addison. He says and 
we all agree: RFC 3066 is too imprecise. There are two responses to that:

1. people like me who say (for my applications) "great, I am free"
2. people like Addison (for his XML or Unicode, etc.) and you for you 
applications who say "gee, I am lost".

The role of the Draft is to address (2) while protecting (1). Now, you say: 
(3) "great, but I want more", and the role of the Draft should be to 
provide more. Now, you see we have a problem, because the Draft is expected 
to provide (1), (2) and (3) and provides only (2).

What the Internet community (charter) expects from a BCP 47 is to address 
the problems of RFC 3066, which only provided (1). This can be done in two 
ways:

1. in writing a language tags framework for the Internet, of which the 
Draft will support (2) and will be ready to support as many (3) as you need.
2. because much work has been spent on (2) and that x-tags permit to 
support other formats and explore (3), to keep (1) and add (2) in just 
changing "replace RFC 3066" by "complement RFC 3066". Then we can all 
freely work.

>As such, I read your first point, regarding characterizing the use of a 
>document, and it tells me something interesting about myself. The fact is, 
>for all its irony, I'm not typically even remotely interested in how a 
>document is used; I tend to focus on how a document is /filed/ by the 
>people who use it.

Then you are more a programer (an author) than a networker. However, please 
note that you use the terms "people who use it". You do not say "people it 
is intended for". So you consider some action by the "non authors".

>If a client tells me, "I want a list of all documents sorted 
>alphabetically by the third sentence of the second chapter," I might ask, 
>"Why? Are you sure?" but if the client insists, I'll dutifully begin 
>writing the appropriate algorithsm.
>
>I admit that issues of defining "What is a language?" and its 
>philosophical correlates are perhaps of coffee-table interest to me, but 
>not of professional interest as a programmer.

I am not sure of that !!! Identfying a language is a very complex bit a 
programing. The language will be defined by statistical rules, etc. Look 
for example the orthographic correction of Word, which support 
multilanguage text and considers the language of the paragraph. This means 
a very interesting piece of code to decide that a paragraph is French and 
the next one is English. But, mainly,as a networker you see a language is 
an interaction protocol. No basic conceptual difference between http and 
English and C. But they can be used in many different modes and over many 
different media. Now, as a programer you become all the sudden very 
interested when due to that the language is not going to be the same 
depending on the mode or the media because the mode or the media is going 
to interfere with the restitution or because you will be able to compact 
exchanges (what you do for example with an error message).

>Instead I merely want to know how I serve to any random end-user the 
>appropriate document following whatever language /he/ thinks he's 
>speaking. This means I need my language tags to be /descriptive,/ not 
>proscriptive, and they have to be extensible in a logical way.

Descriptive/prescriptive have a meaning only in a context of use and by 
which partner of the exchange at which point of the exchange. If I describe 
what is necessary it becomes prescriptive.

Full agreement with the extensibility requirement.

>For better or for worse, if we are describing human languages, we must 
>deal with the reality that human beings /do/ invent languages.

Yes all the time. And the worst of it is they invent it for machines too. 
If I tell you printf("you %s\n", "right"). What is it?

>  It /is/ possible to find websites written in Klingon and poetry written 
> in "Yodish." A bit disconcerting, to be sure, but possible. Certainly, if 
> someone in my living room was trying to speak to me in Klingon, I'd 
> probably request that he get a life, but /personally,/ I can do that. 
> Professionally I can't;

Why? If your customer is Lucas, I bet you will do what he wants and you 
will send your bill. Open consistency, or scalability.

>I can't use the fact that a man who speaks Esperanto, an equally 
>artificial language, is more likely to have a girlfriend than a man who 
>speaks Klingon as a justification for design limitations. Again, human 
>beings /do/ invent languages and any tagging standard which does not 
>account for this reality is inadequate for the task of classifying human 
>languages.

Right!

>Effectively, I look at the language tag in this fashion: if, for any two 
>random given people, I must use two different syntaxes to say the same 
>thing, then I need a different tag. en-US and en-GB are different not just 
>because the British like throwing in superfluous u's, but also because the 
>word "fanny" gets me in more trouble in one place than in the other.

Right! But this may be also true with people who have lived something 
good/bad in common a word alludes to.

>Same with the word "mantequilla" en es-MX versus es-AR. Anyone interested 
>in providing global content must be able to navigate these differences 
>/and/ similar unpredictable differences which arise in the future. What 
>happens, for example, when significant language-use differences are based 
>on social class in the exact same region in the exact same tongue?

100% agreement.

>No existing tag accounts for this; can we be sure that 35 possibilities 
>for extension protect us against future social and creative inventions?

We can be sure of not. And why would pentatridecimal be a limit (why 35 BTW?).

>Extensibility and modularity must be incorporated into the system from its 
>inception or the system will fail. Human creativity and pop culture move 
>altogether faster than specification revision committees.

100% agreement.
But remember you are a programer. So you must consider how to implement it. 
The work we carry is the following/

1. two or more people dialoging/polyloging together establish a space of 
exchanges. This space has common references, we name referents (never mind 
at this stage what they are).
2. in so doing the establish relations. Each relation has its own context 
which modify (enrich, reduce, modify, etc.) the referent.
3. the exhanges are carried in using person to person interintelligibility 
protocols named "languages" supported by the end to end interoperation.
4. the referents and contexts are supported by "CRC" (common reference 
centers) where all the protocols elements (which extend far further than 
the pure language dscussed here, but will include the DNS, the LADP 
Directories being uses, etc.) can be seeked. For example through the DNS, 
OPES, etc.
5. one of the first thing people will do will be to negotiate the 
environment of the exchange and of the relation. For example, they will 
start negotiating the protocol: http/ftp?, English, French, Spanish;;; what 
is the most adequate to general exchange and to specific relations? You see 
that for example I use English here, but French would be better for me and 
I could try Spanish with you. At a given time you negotiate a Spanish 
context when you quote "mantequilla" (as a French speaker es-MX and es-AR 
are the same). You see that depending on the language we negotiated at a 
given time in the exchange or relation, if we quote a Web site, it will be 
different and if we call an LDAP directory the result may be different if 
it is multilingual.

>Ultimately, these tags are not, and /cannot/ be, proscriptive for how 
>people are /allowed/ to classify languages; you can't program humans like 
>a computer. Instead, they need to be descriptive of how human beings /use/ 
>language--"use" it here in both senses, of how they actually speak, write 
>or signal it, and also in how they already classify it for their own 
>purposes of transmission, and that descriptiveness must share the same 
>capacity for growth as the objects it describes. In other words, the tags 
>used to describe languages must themselves be like languages: if the 
>language changes from region to region, so must the tags. If languages 
>divide or combine over time, so must the tags. And if languages can spring 
>wholesale from the minds of hack science-fiction screenwriters, so must 
>the tags.

Right. And the way this is done on a distributed network is subsidiarity. 
This means that one (several?) semantics are to be defined (for mutual 
understanding, parsing, filtering, etc.) - as this Draft does for one - and 
people can use it the way they want. When you registered "megared" or 
"dylanpierce" the rule was first come fist serve. You were free.

>Further, if languages can be analyzed for factors important to one 
>organization but irrelevant to another, so must the tags. The 
>reading-level example, for instance, is intrinsically part of how language 
>is used within a culture; the very educational institutions teaching the 
>language divide material in this fashion. The regional press example 
>points to how material is requested and provided, still however analyzed 
>based on sheer linguistic--word choice and level of abstractness--factors. 
>And since it would be daunting for a registration body to make any attempt 
>at trying to track and describe the myriad of human possibilities for 
>interpreting a language, best we put that charge directly in the hands of 
>the people who do it for a living.

"who do it for a living" or "who live with it"?

>Certainly this means that corporations will also use their namespace for 
>less germaine reasons. And fortunately, parsing agents can ignore their 
>tags and still remain completely in compliance: a small price to pay for 
>effectively making the entire world a de facto but organized registration 
>authority for how language is used worldwide.

people empowerment is usually the best solution when you deal with people 
realtions. However you run into conflict easly because people tend to want 
to do the same thing. You usually have three ways to address this: the 
centralised hierarchy, which quickly lead to control (this is the system 
the Draft creates with strict reference to ISO), the decentralised system 
with needs some kind of technical oligarchy and also has some lose ends 
(this is the current RFC 3066 system), the distributed system where you do 
not have registration hierachichal trees but forests (everyone can create 
its own registry).

Dealing with centralisation is quite sipmlifying. And a temptation. But it 
does not work, because as you describe it the world is centralised, 
decentralised and distributed at the same time.

>I've been informed both here and privately that perhaps a more appropriate 
>approach would be to wait until this document becomes a standard and then 
>propose organizational namespace as a new Internet Draft.

This is not possible as long as this Draft wants to replace RFC 3066. This 
is no problem if it complements it. Then there are two solutions:

- either you proposition is general and you propose a framework for the 
Internet language idenfication which will replace RFC 3066 as BCP 47.
- or your proposition only specifies one additional semantic in parallel of 
the current Draft semantic and you refer to RFC 3066

>  Certainly, you guys know better than I do what we're up against and I'll 
> defer to your best judgment.

Up to you to decide. If the Draft replaces RFC 3066 your extensions will 
have to obey the Draft not RFC 3066. Look at what it implies. IMHO if you 
want to intriduce "p-" you are far better off now than starting a new 
document. Simply because they want the document to go through and you could 
block it without "p-" if your "p-" gathers support. Afterward, everyone 
will tell you "wait for experience" "we have decided no", etc.

We have the IDN experience. IDNA (Internationalised Domain Names 
applications) has been accepted by WG Members on the ground that it would 
be experimental and if it did not work we could change it. It does not 
work. M$ do not intend to implement it, yet after two years and China has 
split from the main Internet with its own co-root on the issue to correct 
it. IMHO (and we dispute on that, the same as we disputed over IDNs) this 
Draft if accepted will do the same. So it is very important to give it all 
the possible flexibility to give it a chance.

>But the entire reason I so strongly support this project is because we 
>undeniably /need/ a parsable internationalization architecture (as a 
>programmer, /I/ need it, and I have yet to speak to a colleage who accuses 
>RFC 3066 of being sufficient) and it needs to speak to all the ways in 
>which languages are used, distributed, selected, and even (perhaps I'm 
>making enemies here) invented.

Full agreement.
But I do not think anyone will tell that RFC 3066 is sufficient: RFC 3066 
is less restrictive. Actually what is wrong is the "langtag" by itself 
instead of supporting all the tags people really needs, whithout tying them 
into rigid supertags.

>Extensibility can be anything from a lifesaver to a mere buzzword. For any 
>descriptor to be extensible in a way which has value, it must be 
>extensible in exactly the same ways in which the object it describes is 
>extensible. If it is not, time and human creativity will obsolete it.

Amen.
jfc


_______________________________________________
Ltru mailing list
Ltru@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/ltru