support of metadata
"Martin J. Dürst"
duerst at it.aoyama.ac.jp
Mon Sep 14 09:42:36 CEST 2009
Hello John, Jean-Michel, others,
On 2009/09/14 11:39, John C Klensin wrote:
> --On Monday, September 14, 2009 02:11 +0200 jean-michel bernier
> de portzamparc<jmabdp at gmail.com> wrote:
>> Dear colleagues,
>> among the points we introduced during the WG/LC that have not
>> been addressed yeat is the end to end support of script
>> oriented metadata (one example being the French majuscules).
>> Metadata can be supported either:
>> - implicitely through an unlike sequence of PVALID codes (ex:
>> FE73-0061 ... 007A)
> Since there is no prohibition on such strings, nothing prevents
> you from using them and interpreting them in a special way,
> assuming that FE73 is not problematic from a Bidi standpoint
> (while it is identified as a "Arabic" character, the code point
> does not appear in Arabic-Shaping.txt, which drives Bidi).
Where in Bidi does it say so? The Bidi document refers to Bidi
properties, and these are defined in UnicodeData.txt
(http://www.unicode.org/Public/UNIDATA/UnicodeData.txt). There, U+FE73
is AL (Arabic Letter), which means that the above won't work exactly as
proposed. Of course, there are ample other characters in Unicode which
may be suited for misuse for the above mentioned purpose.
> However, most applications globally will interpret them as valid
> labels, many or most applications will warn against them as
> mixing scripts, and attempts to use specific characters as
> metadata indicators will not work satisfactorily except in your
> particular applications.
>> If the WG documents remain unchanged in terms of French
>> majuscules support, the support of the two will be offered as
>> a response to the "+" entry. Ex. http://+Etat.fr.
While I'm writing this mail, some comments on majuscules that I have
been thinking about for quite a while.
On careful reading, the French article
http://fr.wikipedia.org/wiki/Majuscule and the English counterpart at
http://en.wikipedia.org/wiki/Majuscule aren't too different at all. Not
only French, but a wide range (if not all) European languages know a
difference between 'majuscules' and 'capitales', and good orthography
and typography is impossible without these concepts, even if they may be
less explicitly distinguished in other languages than in French.
The reason why this distinction hasn't made it into character encoding
is in part historical (less computers than typewriters), but a big part
of it, in my opinion, has to be attributed to the fact that a large
majority of the population everywhere around the world thinks primarily
visually. I.e. most people everywhere around the world want an upper
case letter when they want an upper case letter and a lower case letter
when they want a lower case letter, and on first approximation, they
don't care whether something is a 'majuscule' or a 'capitale' because
they both look the same. Trying to teach everybody to always be aware of
the difference and press the right shift key would simply be impossible.
That's not only the case for this specific difference, but is also a
widely reported phenomenon on other levels, such as document appearance
vs. document structure (think nicely structured, valid (X)HTML) vs. "it
has to look the same on every browser").
This may be difficult to understand for people who think mainly
logically rather than visually. I suggest they take a Myers-Briggs test
and compare their result with the percentages for each type.
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:duerst at it.aoyama.ac.jp
More information about the Idna-update