looking up domain names with unassigned code points

Mon May 12 21:06:48 CEST 2008

At 17:07 12/05/2008, John C Klensin wrote:
> > I do not think this creates any problem as long as users can
> > filter in the point-code they do not want to accept in their
> > private environment ?
>
>I think we need to be very careful here.   I agree with what you
>are saying as I understand it, but I also believe that there are
>ways of reading the above that would get us into trouble.

Who is "us". This sounds IETF network centric. If we want to clearly 
understand one another, I remind that I am "people centric" - as the 
WSIS consensus is. A step further than ISOC which is "user centric", 
what still implies some dependance from the network. I think it is 
important not to wast time.

>So...
>
>Q: Would it be reasonable for a user to set up a sort of
>whitelist of domains to be accepted, with all others being
>rejected or producing warnings?

I spoke of code-points. This is as simple as why to accept an URL I 
cannot print or read?

>A: Yes.  Whether it would be a good idea or not would depend on
>the user and usage patterns, but, if a user wanted to do it, I
>don't think we should try to interfere.

Moreover than you cannot impeach it. I am surpised that no one yet 
discussed OPES in relation to IDNs.

>Note that this really
>has nothing to do with the script in which those domain names
>are written or even whether they are LDH or IDNs.  I would also
>suppose that the idea would be much more useful on the basis of
>domain reputation than on the basis of lexical analysis, but, if
>the user is creating explicit lists, there is no need for anyone
>else to be concerned about the basis being used.

You are engaged in a complex project. Caring about domain names, etc. 
I just load in seconds the DISALOWED list I want.

>Q: Would it be reasonable for a user to set up some sort of
>algorithm or collection of rules to effectively perform
>whitelist selection?
>
>A: Sure.  And if that algorithm includes rejecting IDNs in
>scripts that the user doesn't read, I don't see any problem with
>it as long as the user is aware that there is no necessary
>relationship between the character set / language/ script of the
>content reached through a domain name and the script of the
>domain name itself.  To avoid getting tangled up in a different
>misunderstanding, it is important to remember that all
>standard-conforming domain names are based on Unicode, so there
>is no question of character set and that domain names do not
>have language bindings except heuristically and possibly at
>registration time.

There is no standard yet and for long - but there may be local laws. 
There is an International Standard (what is a confusing term) ISO 
10646. The only requirement is that the DNS receives LDH ASCII 
values. This is not nitpicking, this is that your only chance to 
enforce what you propose is that every user, every country and every 
hacker fully adhere to it as the very best anyone can think of.

>The DNS is clearly designed to be one of the former.   IDNs
>should not change that.  If two different people register the
>same label with an "xn--" prefix in different zones and do so
>with different assumptions about what U-label it will be mapped
>into (if it is mapped at all), then the DNS still works because
>the respective FQDNs are still unique.  But IDNs essentially
>stop working because, for IDNs to be viable, the mappings, in
>both directions, between A-labels and U-labels must be
>consistent and predictable _and_, given the way the DNS is
>constructed, the mappings to be used must not depend on the zone
>(or DNS hierarchy) in which the label is embedded.

To obtain that the process must be end to end at network level. What 
IDNA does not intend to be. Because IDNA fakes a presentation layer 
at user application level, but does not implement an Internet 
presentation layer.

>Confusion between what a user can filter and local decisions
>about how strings should be interpreted or mapped between
>A-labels and U-labels, or between universally-interpretable
>identifiers and personal (or local) aliases gets us into a lot
>of trouble, IMO.

Interesting discussion but not the point I raised. Because, if we analyse it,
- you put yourself at networked user application level to organise 
the way users could accept or not "domain names" and what it may imply.
- I start at a character filter level to keep my screen/printer/machine tidy.

At the end of the day one comes back to the same problem. Without 
presentation layer you can only be at user application level. That 
can work as long as everyone agree. But the problem here is that the 
target is to support a dynamic diversity.

This is why IMHO the target should be to document a punycode based 
mechanic that everyone can easily adapt/extend, so it becomes a part 
of ISO 10646/Unicode and it is more convenient for everyone to use 
the same one. Like ASCII, TCP/IP. Like the DNS. etc. With _no_ 
built-in constraint. But with all the constraints to be easily loaded 
by the user if he wants them. i.e. not saying "this is the way it 
must work", but "if you want this, do it that way". A Multilingual 
Internet RFC 1958 extension.

jfc