Not Folding Case (was: Re: Eszett ( was AW: Esszett, Final Sigma, ZWJ and ZWNJ)

JFC Morfin jefsey at jefsey.com
Wed Feb 25 19:30:55 CET 2009


Dear Vint,

I cannot avoid characterizing the attached mails as "ad linguam" and 
"ad homines". I hope they stop. Happily, Vaggelis just raised 
equivalent problems in the Greek language and I hope, therefore, that 
we may start working on his problem first, giving me time in turn to 
poll French people on what they want to do. I recall that the 
situation is not for this WG to decide what French speakers should 
do, but rather how French speakers will adapt the Internet to their 
individual needs and to their language if they are not supported.

This being said, these mails show that half-knowledge is dangerous 
and worse than ignorance. They illustrate why Unicode is not already 
appropriate for supporting the Intersem (Semantic and Multilingual 
International Network). Unicode only considers upper and lower cases 
and titles. Assuming that big caps are the same as small caps. In 
French, at least, (and I understand in some other languages such as 
in Greek) this is not the case. This confusion is also seen in the 
English language, which translates the French "majuscule" as 
"capital", while in French for upper-case means "capitale". The same, 
Unicode and keyboards do not support the breves and macrons (macrons 
are well accepted now in press titles).

Hence, the confusion brought about by Martin. For example, the French 
Postal Service requests people to use big "capitale" letters, which 
do not have accents. But in good orthography, the first letter may be 
a "majuscule" and in that case should have an accent in non-postal 
usage. This should be considered in the RFCs on civic addressing.

Hence, the confusion brought about by Mark. The reference he makes to 
"Le Monde" is a well known case that demonstrates exactly the 
opposite. The "majuscule" is "S" in "sériie" actually and the rest of 
the word is in small caps (which are capitalized lower case). In such 
a case, an accent is accepted in French speaking countries (e.g. 
Switzerland) where "majuscules" are orthographically not accentuated.

Hence, the confusion brought about by Kenneth. He also confuses 
"majuscules" and "capitales". "Majuscules" must always (and them 
only) be in upper-case. In proper language, they SHOULD be 
accentuated. However, when this is not possible (keyboard, font, 
handwriting) they MUST at least be in upper case.

I note here that these usages that carry a precise semantic are not 
2,500 years old as in the Greek but they are certainly more than 250 
years old. The question is: "is IDNA going to respect the RFC and 
experts' fancies more than people's actual usage"?

The implication of Michel Suignard and the scornful style of these 
three mails in turn lead me to think that, unfortunately, their 
position is not wrong regarding multilinguistics, a misled network 
architecture, or protocol confusion, but rather clearly of economic 
intelligence. I have nothing to object to in this case, in which I 
consider it as fully illustrating IAB's RFC 3869. This is only 
something that will make users more precautious about IDNA and, 
therefore, another difficulty to eventually be addressed.

Multilinguistics is not a discipline that is easily applied!
jfc

At 06:02 25/02/2009, Martin Duerst wrote:
>This is a well-known phenomenon. French people get told that there
>are no accents on upper-case letters in school, and then live on with
>that belief. Thereafter, they regularly see upper-case letters with
>accents, but they don't realize that they might have to change their
>beliefs because reading these upper-case letters with accents happens
>unconciously. Some even might claim that something is wrong when
>somebody shows them an example of an upper-case letter with an
>accent.
>
>The best way to confirm that accents can and do appear on upper-case
>letters is to check the "Petit Robert", the most widely used
>French-French dictionary.
>
>Regards,    Martin.
>
>P.S.: Effects such as the above are one reason why Internationalization
>       is difficult. It's not enough to ask a few native speakers/writers,
>       one has to find out who the real experts are and confirm things
>       in the field with an open eye.
>
>P.P.S.: The above is not just heresay, I had several such experiences
>         with relatives of mine.
>
>
>At 06:22 09/02/25, Mark Davis wrote:
> >Moreover, the supposedly required deaccenting of uppercase French 
> appears to be a canard. Not only is it contested by 
> internationalization experts like Michel Suignard, but even casual 
> browsing will find respectable usage of uppercase with accents such as:
> >
> ><http://www.lemonde.fr/>http://www.lemonde.fr/
> >
> >eg on that page "LE MONDE DES S$B%F13(BIES", plus the tabs.
> >
> >Mark
> >
> >
> >On Tue, Feb 24, 2009 at 13:12, Kenneth Whistler 
> <<mailto:kenw at sybase.com>kenw at sybase.com> wrote:
> >>jfc said:
> >>
> >>> May I add that French supports calls for uper cases NOT to be folded
> >>> but to be supported as characters by their own.
> >>> This means that "<http://ecole.fr>ecole.fr", 
> "$B%F&D%%(B<http://cole.fr>cole.fr" and "Ecole.fr" are to be three
> >>> different domain names.
> >>
> >>which is just silly. The implication of that is that the
> >>following would also be different domain names:
> >>
> >>eCole.fr
> >>ecOle.fr
> >>ecoLe.fr
> >>ecolE.fr
> >>ECole.fr
> >>EcOle.fr
> >>EcoLe.fr
> >>EcolE.fr
> >>ECOle.fr
> >>ECoLe.fr
> >>EColE.fr
> >>
> >>etc., etc., for 32 different strings, before even starting
> >>to consider the accent folding issues.
> >>
> >>This is incompatible both with existing ASCII domain name
> >>usage *and* with IDNA 2003 domain name usage. And it
> >>would result in a combinatorial bundling nightmare requiring
> >>2^n items be bundled for every n Latin (or Greek or Cyrillic)
> >>letter in a domain name.
> >>
> >>And no, you cannot get away with claiming this would only
> >>apply to the first letter of a domain name, because there
> >>is no mechanism in IDNA for parsing out words in domain
> >>name labels, viz.:
> >>
> >><http://dangerecole.blogspot.com/>http://dangerecole.blogspot.com/
> >>
> >>as opposed to:
> >>
> >><http://www.ecoleprinceton.org/>http://www.ecoleprinceton.org/
> >>
> >>or
> >>
> >><http://www.ecolephilippegaulier.com/>http://www.ecolephilippegaulier.com/
> >>
> >>--Ken
> >>
> >>_______________________________________________
> >>Idna-update mailing list
> >><mailto:Idna-update at alvestrand.no>Idna-update at alvestrand.no
> >>http://www.alvestrand.no/mailman/listinfo/idna-update
> >
> >_______________________________________________
> >Idna-update mailing list
> >Idna-update at alvestrand.no
> >http://www.alvestrand.no/mailman/listinfo/idna-update
>
>
>#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
>#-#-#  http://www.sw.it.aoyama.ac.jp     mailto:duerst at it.aoyama.ac.jp
>
>
>_______________________________________________
>Idna-update mailing list
>Idna-update at alvestrand.no
>http://www.alvestrand.no/mailman/listinfo/idna-update
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090225/c6856359/attachment-0001.htm 


More information about the Idna-update mailing list