Casefolding Sigma (was: Re: IDNAbis Preprocessing Draft)

JFC Morfin jefsey at
Tue Jan 22 23:36:34 CET 2008

At 19:59 22/01/2008, John C Klensin wrote:
>When we can avoid it, I find it helpful to avoid thinking about
>and debating individual characters.  Instead, let's focus on

Dear John,
your analysis seems to be correct but on one point that Michael 
pointed out. You talk of "characters" but  do not define what a 
"character" is. It seems it can be:
- either a visusal item (Michael)
- either a registered DNS item (you)
- or a Unicode point (IDNA).

If you do not say:
- what a character is,
- at what layer language (and therefore semantic) issues are dealt with,
we will stay with confusion, and different forms of layer violation 
depending on who speaks.

As far I am concerned:

1. "characters" are a set of visual graphics that are registered in 
the same DNS way.
     - The way they are displayed as initial, middle, last character, 
in upper, small upper or lower case is irrelevant.
     - The script they belong to is irrelevant.

2. language related issues are semantic and do not belong to the 
layer of IETF responsibility. However, nothing must prevent them to 
be restored at application layer, so the differences made by Michael 
can be respected (Words is able to restore upper case at the begining 
of a sentence, etc.). IETF does not deal with artists, graphists, 
lawyers, etc. but with computers which in turn deal with them.

3. because ccTLD tables can include characters using the same sign as 
others tables, they are a working basis, but the semiologic sign code 
is not their concatenation (we would meet the same problem as with Unicode).

4. there a possibility to retain most of Unicode at the price of 
complexity. It is to use classes (which can be IDNA classes), to be 
identified in a way or another, whith class local rules. This makes 
IDNA more complex, but possibly faster to implement.

5. every solution must be fool/phishing proof at every DNS level.

This means that the way people/word processors write/print/display 
the characters is orthogonal to domain name labels.

