Message-Id: <6.2.1.2.2.20050519013832.04254060@mail.jefsey.com>
Date: Thu, 19 May 2005 03:03:10 +0200
To: "Peter Constable" <petercon@microsoft.com>, <unicode@unicode.org>
From: "JFC (Jefsey) Morfin" <jefsey@jefsey.com>
Subject: RE: what is Latn?
In-Reply-To: <F8ACB1B494D9734783AAB114D0CE68FE05FCC277@RED-MSG-52.redmon
 d.corp.microsoft.com>
References: <F8ACB1B494D9734783AAB114D0CE68FE05FCC277@RED-MSG-52.redmond.corp.microsoft.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"; format=flowed
Content-Transfer-Encoding: 8bit
Sender: unicode-bounce@unicode.org
Errors-To: unicode-bounce@unicode.org
Precedence: bulk

At 17:56 18/05/2005, Peter Constable wrote:
> > But when you have orthogonal things to relate, as you want to in
> > several documents, you need to have a relational system.
>
>I don't disagree; I was only objecting to the critique that ISO 15924 is
>faulty because it isn't relational in the sense you refer to.

???
Here I am lost. I do not see how ISO 15924 could be faulty. It is a list. 
There are billions of lists. They what they are: lists. They can be piles 
of  names, numbers, poems, etc.

Now where there is a problem is when you want to use some of their items 
without having given them a meaning first, which means "item=definition".

> > As a computer you go by binary stuff. If a computer is to relate
>French and
> > Latn, it must have binary element it can compare using a program.
> >
> > Now, a person with a bit of logic will do the same.
> >
> > As long as you do not tell me what is in Latn, I cannot tell you if it
>Latn
> > applies to French.
>
>If you want to dumb down to a level of not assuming anything that's not
>stated explicitly, then you are right. I doubt there are many here who
>operate that way on a regular basis.

This is the difference between poets and programer. We both live in the 
same world, but the poet believe his dream is enough, the programer knows 
that he has to declare the things before using them - and that gives him a 
lot of possibilities. You are a smart guy but you are used:  your document 
quoted in ISO 639-4 is used to support an erroneous proposition I do not 
think you really support if you analyse it.

Please consider carefully what happens in real life. Your case and mine.

1. your case. You think you can assume things which have not been stated 
explicitly. I have no problem with that, but in programing (or physic, or 
mathematics) it is named a constant. This means this not even a default, it 
means that this is something (even if you do not realise it) you assume as 
universal, created in. This is good for a few universal constants like the 
speed of the light, etc. But others are actually common understandings, 
i.e. part of a culture. The more they are, the less all of them are assumed 
the same by everyone. You say I do not need to define "Latn". May be you 
talking with Mark, Phillip and Michael. But if I ask (this what I did) to a 
Unicode list "what is Latn", responses are numerous and confuse. You 
believed you could assume there is only one subjective, precise, intuitive, 
etc. I do no know, but one obvious response. There is none: there is a 
controversy. Otherwise the thread would be closed. Result is that you are 
mudded trying to define that single meaning you assumed.

2. my case. I know that Michael did his home work and Latn is a good name 
for a script. I know that a script is a "set of graphic characters used for 
the written form of one or several languages" (even if I have some problems 
with that definition). This does not tell me what is the set for "Latn". So 
I can ask the Unicode list about Latn and French, and get from some the 
components which should be in the set, there are people in here quite 
knowledgeable. What is interesting is that I can do the list as a partition 
of the ISO 10646 global character set (actually I cannot, but nearly). This 
is interesting because I can now work on several defined alternatives. 
Where you fought to try to define a concept, I have different names lists, 
perfectly clear to others, stable and workable. I can given them variant 
numbers, discuss them, tune them ... and say if yes or not one of them 
support French. And if the same/others ones support other languages. To do 
that I am not going to argue for hours about the particular case of that 
letter in that language, etc. I am going to look at the norms associated to 
that language and at the associated alphabet. If it matches one of the 
variant, that variant is correct.

You are going to tell me "which norms"? This is the main problem in the 
whole Davis Addison / Constable logic (quotes of the ISO 639-4 draft): you 
also assume that you also do not need to consider the norms (except 
orthography (why?)). This is not because they may not be documented that 
norms do not exist (grammar, semantic, styles, level of complexity, etc. ). 
I accept this is less worked in English than in French (actually it is 
quite worked in English/American, but you do not realise it: consider the 
lingual obligations put by the DoD to its contractors, consider the very 
concept of "Basic English") - but you confused language with a norm set 
with "computer languages".

(By the way I do not find the English mathematic word equivalent to 
"normé", meaning the norms of which have been declared or identified - not 
decided/invented).

The very key element missing (and for the time being) killing all the logic 
of your ISO/IETF proposition is to forget about the way people speak - 
their best common relational practices. The normating rules set associated 
to languages, structuring its reference system (I abbreviate as 
"referent"). This is what permits a computer to understand, correct, and 
talk them. Then you have the "style", the way they use it, to fully qualify 
the language - with possible iterative/reciprocal influences.

Now, I fully accept that you can document a language/page in using 
intuitive/subjective/assumed descriptors only, but for humans who will 
post-assume (with occasional misunderstanding most probably) what you meant 
to say - what you have yourself assumed. But you cannot for applications. 
And the risk of confusions/conflicts will probably be very high if that 
humans are from a different culture). This is why there is a major 
difference between an informative and a normative description, the very 
first point to discuss about terms and definitions/purposes. The very first 
question to rise in point ISO 639-4/4.8 and in the xx.txt Drafts.

Even - and may be them first - your "end users" (I understand as "out of 
the reach of SDOs") can understand that: I tend to observe it is more 
difficult for experts who are more involved in their stuff. But this is a 
discussion we already had.

> > And please do not quote Unicode Character Set as a middle reference.
> > It is not an ISO Standard, and it does not fully support French.
>
>Not fully support French? Funny thing, then, that no national body of a
>country with a significant Francophone population has been bringing
>their request to WG2, and that no member of the Unicode Consortium
>selling products to Francophone markets has been requesting changes
>needed to meet the requirements of those markets. I'm curious to know
>what the lacuna might be.

Funny thing, that you are so unaware of Microsoft products and clients. You 
did not know about non ASCII programing environment, now that. Please ask 
your people from Word the compromise they found to support question marks 
at the end of a sentence, obliging all of us to rephrase if Word wants to 
put it at the beginning of the next line :-) Best that the Unicode lacks, 
but not perfect.

The story about the horses asses, and the comment about a different origin, 
were interesting: the negative comment did not realise that his more modern 
origin was the ponies asses in British mines. Unicode or ISO or ECMA, etc. 
is not the origin: origin is the characters and the people. But the story 
also shows the hysteresis. We will not rebuild Unicode, it is a step ahead 
which will stay. But there will be other steps.

You give the response: "members of the Unicode Consortium selling". We take 
Unicode as a good commercial effort by a private company cartel with due 
commercial motivations. Even if IAB lacks understanding about languages, 
they understand R&D funding and results. They have perfectly qualified 
Unicode (as most of the current efforts) in RFC 3869. We/I share that view.

What is interesting however, is that the work I do on CRCs, shows us as we 
can canonically, in a language and koine/autonym independent way use and 
correct the Unicode lacks ... provided a few more common sense practice and 
concepts are included in ISO 639-4, IETF Drafts and global network culture. 
Please consider OSI if you known it. It was the international network 
second generation: it was specified in four or six languages and the 
technology was totally multilingual as being bit oriented. What OSI did, we 
should be able to make it better.

Take care.
jfc