support of metadata

Mon Sep 14 09:42:36 CEST 2009

Hello John, Jean-Michel, others,

On 2009/09/14 11:39, John C Klensin wrote:
>
> --On Monday, September 14, 2009 02:11 +0200 jean-michel bernier
> de portzamparc<jmabdp at gmail.com>  wrote:
>
>> Dear colleagues,
>> among the points we introduced during the WG/LC that have not
>> been addressed yeat is the end to end support of script
>> oriented metadata (one example being the French majuscules).
>> Metadata can be supported either:

>> - implicitely through an unlike sequence of PVALID codes (ex:
>> FE73-0061 ... 007A)
>
> Since there is no prohibition on such strings, nothing prevents
> you from using them and interpreting them in a special way,
> assuming that FE73 is not problematic from a Bidi standpoint
> (while it is identified as a "Arabic" character, the code point
> does not appear in Arabic-Shaping.txt, which drives Bidi).

Where in Bidi does it say so? The Bidi document refers to Bidi 
properties, and these are defined in UnicodeData.txt 
(http://www.unicode.org/Public/UNIDATA/UnicodeData.txt). There, U+FE73 
is AL (Arabic Letter), which means that the above won't work exactly as 
proposed. Of course, there are ample other characters in Unicode which 
may be suited for misuse for the above mentioned purpose.

> However, most applications globally will interpret them as valid
> labels, many or most applications will warn against them as
> mixing scripts, and attempts to use specific characters as
> metadata indicators will not work satisfactorily except in your
> particular applications.

Yes indeed.

>> If the WG documents remain unchanged in terms of French
>> majuscules support, the support of the two will be offered as
>> a response to the "+" entry. Ex. http://+Etat.fr.

While I'm writing this mail, some comments on majuscules that I have 
been thinking about for quite a while.

On careful reading, the French article 
http://fr.wikipedia.org/wiki/Majuscule and the English counterpart at 
http://en.wikipedia.org/wiki/Majuscule aren't too different at all. Not 
only French, but a wide range (if not all) European languages know a 
difference between 'majuscules' and 'capitales', and good orthography 
and typography is impossible without these concepts, even if they may be 
less explicitly distinguished in other languages than in French.

The reason why this distinction hasn't made it into character encoding 
is in part historical (less computers than typewriters), but a big part 
of it, in my opinion, has to be attributed to the fact that a large 
majority of the population everywhere around the world thinks primarily 
visually. I.e. most people everywhere around the world want an upper 
case letter when they want an upper case letter and a lower case letter 
when they want a lower case letter, and on first approximation, they 
don't care whether something is a 'majuscule' or a 'capitale' because 
they both look the same. Trying to teach everybody to always be aware of 
the difference and press the right shift key would simply be impossible. 
That's not only the case for this specific difference, but is also a 
widely reported phenomenon on other levels, such as document appearance 
vs. document structure (think nicely structured, valid (X)HTML) vs. "it 
has to look the same on every browser").

This may be difficult to understand for people who think mainly 
logically rather than visually. I suggest they take a Myers-Briggs test 
and compare their result with the percentages for each type.

Regards,    Martin.

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst at it.aoyama.ac.jp