Eszett (was Implementation questions)

John C Klensin klensin at jck.com
Wed Dec 24 21:32:21 CET 2008



--On Wednesday, 24 December, 2008 18:50 +0100 JFC Morfin
<jefsey at jefsey.com> wrote:

> At 14:16 24/12/2008, John C Klensin wrote:
>> Certainly, if I were a user registering a label that might be
>> construed as being in German, I would try to do that
>> regardless of what the relevant registry required.   But, if
>> the registry did not exclude mixed-script registrations, I
>> might also register a label containing β (U+03B2) any time
>> I intended ß (U+00DF) to appear and vice versa, simply
>> because, in some fonts, they look a little too much alike.
> 
> John,
> you put yourself in a very confortable situation.

Comfortable?  Or uncomfortable?

> More pragmatically let us imagine that I am another German who
> has interests in the same name appearance and I register the
> other name. The question now is who is legally (UDRP, ccTLD
> rules) legitimate? If a rule favors me, you will legitimately
> sue the ccTLD Manager since the actual underlaying ASCII
> strings are actually different.

First of all, my comment about what I would do as a potential
registrant is independent of the advice I would give registries.
I have always advised registries to avoid mixed-script labels
unless special circumstances arise and to use variant techniques
to restrict registrations or separate ownership of
easily-confused labels.   Sometimes registries take that type of
advice and sometimes they do not and I recognize that there are
legitimate reasons for not doing so.

And, as usual, this is not just about TLD registries and other
zones that are subject to global policies.   Any zone
administrator is going to need to balance things that people
want to do (whether you consider it legitimate or not) against
maximizing user protection.  If the latter were our only
objective, we would stick with ASCII domain names (which I would
certainly not advocate).

>From that perspective, whatever leverage the UDRP, local rules
and regulations, etc., provide is very much part of the system.
Many decisions about what to register are ultimately up to the
registry. If the registry perceives risk if they get it wrong,
that is not necessarily a bad thing.

> You may remember that during the joint ITU/UNESCO meeting in
> Geneva I questionned the WIPO on such situations (as well as
> on babel-names [protected ASCII labels]). The response (after
> a few cofees) was that they respected IPR in ASCII and in
> Unicode, but were unable to decide when there was conflict
> between the two.

There is no more reason to believe that there is a conflict
between ASCII and Unicode.  There are conflicts between
(non-ASCII) scripts within Unicode as well.   To take a handy
European example, there is more overlap between Greek and
Cyrillic than between either and ASCII. 

> What do you think the ccTLD Manager can do, when all such an
> additional hassle and costs are to be supported by zero added
> revenue? He only can forget about any IETF rule, respecting
> the rule of the code. This is exactly what ".su" documented.
> As long as the IDNS is only to be supported at user
> application layer and not at core presentation layer (i.e.
> equal to the DNS which actually use the "default" presentation
> layer)  the problem will stay with us.

I don't understand what you are suggesting here.  Nothing
requires a ccTLD Manager to support IDNs if they don't consider
them either profitable or part of their mission.  Nothing
external to a country imposes a "zero added revenue" constraint.
Even with variants, there are some approaches that involve
little additional hassle or cost (e.g., prohibit the
registration of a confusing name by anyone if the name which
might be confused with it is already registered) even though the
bundling approaches do introduce some complexity.

> Then, what is the solution?
> 
> The solution is here now. Let use a multilingual search engine
> which will answer when you enter keywords in different
> languages (with eszett). So you can use American ASCII to
> resolve a Chinese _and_ Russian sites in a virtual thematic
> global network. This works today on privately licensed
> machines, this will work on multitier services. The
> Multilingual and Semantic Internet does exist today. On a
> limited linguistic extension paying basis. The targeted number
> of languages was documented in the same Geneva meeting: 150
> languages maximum. The SES (Search Engine System) is a common
> people oriented simple and advertizing related alternative to
> DNS. It will be far easier to implement commercially driven
> semantic resolution.

As you probably know, that 150 number has been widely disputed,
partially on a cultural preservation basis.  But, be that as it
may...

> I have no objection to SES IDNA compatibility and common
> language support issues to be over discussed in this WG. But
> at this time the IETF IDNA2008 LC should have been completed
> so the various ML-DNS projects would start being discussed
> among users.

As you know, I believe that, for many purposes, search engines,
intentionally-populated directories, and alternate naming
systems will, in the long term, be more important than IDNs.
That doesn't imply that IDNs are not important or do not have a
role.  But, again, the decision is ultimately up to the
registries.  Perhaps some registries will decide that the search
engines, etc., will be important enough that they should just
avoid registration of IDNs.   If they do, I hope no one tries to
make global rules that _require_ that they support IDNs (in a
strange way, that is another reason for not mapping Eszett --
since the mapping can produce an all-ASCII string, a zone that
decided to not provide IDN support could find apparent IDNs in
its zone anyway, with consequent increased support costs
(however small)).   But, if a zone does decide to support IDNs
and still believes that other navigation methods will be
important, then I hope that whatever we do with IDNA2008 will
move things toward labels that are as unambiguous as possible,
including especially having fully reversible mappings between
A-labels and U-labels (and hence no labels in URIs or IRIs that
in native character form but are not either LDH-labels or
U-labels).

     john



More information about the Idna-update mailing list