[Idna-arabicscript] Re: Punycode Mixed-case annotation

Mark Davis mark.davis at icu-project.org
Fri Jul 3 01:47:42 CEST 2009


What Jefsey suggests is all nonsense.

Here is what is happening. Basically, one can make a distinction between
capital letters that are required linguistically (the *L* at the start of
the sentence below and the *M* at the start of the proper name *Marcel* in
the example below) and those that just happen to have a capital form
(because the sentence is set in all caps). The former in French are called *
majuscules*, and the latter *capitales*. From Wikipedia:

La phrase : « LONGTEMPS MARCEL S’EST COUCHÉ DE BONNE HEURE » est écrite en
capitales, mais seule la première et la dixième lettres sont majuscules. On
s’en rend mieux compte si on écrit cette phrase en petites capitales :
« Longtemps
Marcel s’est couché de bonne heure ».

However, that distinction is not captured in Unicode, nor in ASCII, nor in
any other character encodings that I know of, *nor should it be*. There are
many distinctions in the *usage* of characters that are not, and should not
be, represented in the encoding. One could just as well argue that the
distinction between the pronunciation of "o" in "rove", "move", and "love"
needs to be in the encoding, or that the difference between the "." in
"1.2", "etc.", or "." at the end of a sentence needs to be in the encoding.
That would end up with scads of identical characters that people would not
distinguish when keying, could not distinguish in display, are not in any
existing data, could not be depended on in processing, but would be just a
marvelous opportunity for spoofing!
Nor, of course, should anyone think of trying to capture this distinction in
IDNA.

Mark

On Wed, Jul 1, 2009 at 06:13, jefsey <jefsey at jefsey.com> wrote:

> At 14:38 01/07/2009, Vint Cerf wrote:
>
>> Jefsey, maybe my email rendering code is broken but the string I am
>> seeing in your message looks like a purely ascii string making it a
>> conventional ascii domain name not requiring any special treatment.
>>
>
> Vint,
> The string you see on your screen is ASCII - the string I enter is not.
> This is because Unicode is being used and Unicode does not properly supports
> the French language orthotypography.
>
> In "Ecole.fra" the semantics demands that "E" is not an "E" ASCII but a
> French "E" majuscule. I am sure Unicode has no problem in supporting them,
> or TLD Tables can be used. "Ecole.fra" is a user domain name which may have
> nothing to do (except phishing) with "école.fra", "ecole.fra" and
> "École.fra".
>
> This means there are wo problems:
>
> - they are to be able to resolve different IP addresses.
> - they also are to be able to resolve the same IP address.
>
> However, the decision is only with the registrant. So, the simplest is to
> consider each usage domain name separately. If some are to resolve the same
> IP, the registrant will only register several domain names.
>
> jfc
>
>
>
>  On Jul 1, 2009, at 8:19 AM, JFC Morfin wrote:
>>
>>  At 04:18 01/07/2009, Vint Cerf wrote:
>>>
>>>> jefsey, it cannot be an A-label since that is defined to include
>>>> "xn--" as a prefix.
>>>>
>>>
>>> Dear Vint,
>>> This is the conflict we have and the resulting confusion.
>>>
>>> - usage considers what is enter in usage applications (from the
>>> entire Unicode set [actually it needs much more (ex.Natal from M$]
>>> _including_ ASCII)
>>> - I consider what the DNS is asked to resolve after an algorithm
>>> (including null and punycode ones) has been applied.
>>> - you consider complex U-/A-... labels which do not overlap.
>>> "Ecole.fra" is not supported by this label list.
>>>
>>> "Ecole.fra",
>>> 1) it is a U-label
>>> 2) but you do not want to support it as xn--cole-abc.fra. You keep
>>> it as an non punycoded yet A-label without "xn--" prefix.
>>> 3) this confuses Andrew and John (who should not be concerned)
>>> because they try to understand how the DNS and Unicode casefolding
>>> could support the orthogonal User's casefolding (i.e.
>>> orthotypography).
>>>
>>> I see no other solution that to carry the mapping:
>>> - at the user application layer (outside of the Internet pile) and
>>> the Internet should remain transparent to "xn--" labels.
>>> - through a different presentation, and casefolding mapping etc.
>>> using "xs--" labels.
>>>
>>> Neither DNS nor Unicode casefolding are concerned. Only usage's side
>>> application (this is why the architecture is named IDNA): usage
>>> casefolding is to be decided and managed by the user through his
>>> application (if he wants to have it), depending on his own context,
>>> language, culture, job, kind of application, etc. and if the other
>>> end supports additional usage side features. This permits an
>>> application to application relation, where entropic actions (as
>>> required by the DNS resolution) can be restored through a
>>> negantropic process supported by metadata exchanges. As Elisabeth
>>> explained it.
>>>
>>> This being said, if you find a case (so far there is none) where a
>>> entropic mapping (casefolding or other) is universally accepted by
>>> users - or restored through a negentropic process such as what we
>>> call "duplex entities" -, supported by exvery encryption system,
>>> transparent to other technologies as well (netneutrality), etc. it
>>> could very well be investigated at protocol level (i.e within the
>>> Internet pile). Otherwise mapping can only be processed outside of
>>> the Internet pile. This is exactly the target and Charter of IDNA as
>>> I read and respect it.
>>>
>>> jfc
>>>
>>>
>>>  On Jun 30, 2009, at 8:42 PM, JFC Morfin wrote:
>>>>
>>>>  At 17:12 30/06/2009, Andrew Sullivan wrote:
>>>>>
>>>>>> and, in particular, if we want to be sensitive to such upper and
>>>>>> lower case when the data comes back to us from the DNS resolver
>>>>>> level.
>>>>>>
>>>>>
>>>>> Please, let not confuse :
>>>>>
>>>>> 1) the DNS layer, where you run the game the way you want according
>>>>> to your published rules, and the Application Domain Names.
>>>>> 2) French majuscules and upper cases.
>>>>>
>>>>> The U-label, A-label, etc. wording IMHO helps that confusion. Is
>>>>> Ecole.fra an U-label or an A-label? All I know is that it is an
>>>>> application domain name (like in "IDNA"). Also, that if the proposed
>>>>> text does not actually document the "ecole.fra, école.fra,
>>>>> Ecole.fra" case, this case will have to be documented during the
>>>>> IETF/LC.
>>>>>
>>>>> May be it is the best thing to do? so the whole IETF community may
>>>>> decide what does concern the whole community?
>>>>> jfc
>>>>>
>>>>> _______________________________________________
>>>>> Idna-arabicscript mailing list
>>>>> Arabic Script IDN Working Group (ASIWG)
>>>>> Idna-arabicscript at lists.irnic.ir
>>>>> http://lists.irnic.ir/mailman/listinfo/idna-arabicscript
>>>>>
>>>>
> _______________________________________________
> Idna-arabicscript mailing list
> Arabic Script IDN Working Group (ASIWG)
> Idna-arabicscript at lists.irnic.ir
> http://lists.irnic.ir/mailman/listinfo/idna-arabicscript
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090702/feef5e35/attachment-0001.htm 


More information about the Idna-update mailing list