Stop me if I've misunderstood...

Elisabeth Blanconil eblanconil at gmail.com
Fri Jul 10 20:15:06 CEST 2009


Dear Mark,
this IS the real problem. Not the way Google works well or not; but
what your plea suggests.

Let consider your own experience. Your document "Globalization:
resistance is futile"
(ftp://ftp.software.ibm.com/software/globalization/documents/globalization.pdf)
has been/is very inspiring for many. This is precisely our main
currently point of concern with the Intersem. In our "anthrobotmix
society" to keep man as the master and machines as slaves.

This is because we think that the response to your Google difficulty
is a full "semiotic unicode". i.e. including the standardisation of
all the concepts (this is what is partly attempted by the ISO 11179
etc. supported long standing metadata effort). Instead of indexing by
URLs we index meanings, progressively building an "idea grid", the
same as Unicode has a "character grid".

Actually we believe that Google can only survive that way in the
middle range. This is our concern because your approach is
centralized, while humanity is distributed. This is in that way that
we are your deadly competitor. And we fully understand your opposition
to Jefsey, Louis, Xavier, Rémy, etc.

However, we really respect what you achieved, and when we see you tied
by the details of your own technology (like Google is with the URL
outdated concept at this place) in becoming the Queen of Borgs, we
worry about the semantic addressing system: how to make sure it cannot
be used in its own turn as a domination tool. This is something that
you alone with your Unicode leadership experience can tell us.

But at the same time, I feel we discuss "futile" issues as associating
a page with an URL. In _your_ existing world the same URL may lead us
to many different pages based not only due to the place we call from,
but due to langtags and language filtering. Last:  point: do you mean
that you keep the same indexing if the content of a page changes ????

Frankly, if you could forget Google a minute and consider Unicode and
how Unicode could possibly help IDNA with a bold new and adapted
effort, based upon your personnal experience. I am sure it would help
a lot.

Cheers!
Elisabeth Blanconil.


2009/7/10 Mark Davis ⌛ <mark at macchiato.com>:
> I'd like to echo what Shawn says. Having an effectively indeterminate
> mapping is nasty for browser vendors, and for search engines, and email
> clients, and for others -- and ultimately that translates into being
> problematic for end users. (By "effectively indeterminate", I mean that a
> lookup of href='http://übergrößen.de' may go to different places, depending
> on the version of IDNA, and href='http://Übergrößen.de' may or may not work
> at all, depending on the version of IDNA.) He's talked about the browser
> side; look at some other cases.
>
> At Google, we associate a URL with content. Internally, we depend heavily on
> that association, and build up indices on that basis. When
> href='http://Übergrößen.de' goes to a different place than
> href='http://übergrössen.de' (on different browsers or different
> combinations of browsers), that complicates our internals considerably. It
> also makes presentation to users more difficult; when URLs are displayed we
> don't know which form to provide; doing that on the basis of the user's
> browser version is unpleasant, and if the version is older, we'd have no
> choice but to present the punycode form for the new mapping.
>
> It is, of course, technically feasible to associate one URL with two
> different pages, but the complications of doing this with indices that span
> you-can't-even-believe-how-many pages are pretty daunting. And the security
> implications are problematic; what if the URL leads to a spoof site on some
> browsers, but to a legitimate site on others?
>
> And take email clients. Most good ones nowadays parse content and turn URLs
> into links. What should they do with http://Übergrößen.de? Which link did
> the originator of the email mean? What browser were they working on? That is
> impossible to know.
>
> Moreover, I don't see this changing. All the vendors will need to support
> IDNA2003 mappings, otherwise users will complain that (as Shawn says) that
> the vendor's products are broken. (This WG is insulated from those user
> complaints.) Because the vendors will all need to support both forms, people
> can and will go on generating content that uses the IDNA2003 mappings.
> Because people will continue to generate content that uses those mappings,
> the vendors will need to keep supporting them.
>
> The worst result would be that we come out with IDNA2008, and it gets
> ignored by some major players because of these incompatibilities. Look at
> XML 1.1; it had significant advantages of XML 1.0, and a few of minor
> incompatibilities with XML 1.0 (minor at least in the eyes of its working
> group), but those "minor" incompatibilities stopped it dead in the water.
>
> ===
>
> What has become clear to me is that the problems with mapping that inspired
> the original design of IDNA2008 were primarily problems on the registration
> side. When I look back over the email, all of them seem to be associated
> with complaints from registrars (it's a bit difficult to make out, since
> there is no citation of data as to the magnitude of problems, just anecdotes
> without specifics, but this seems to be the case). And I think have
> developed a consensus that forbidding a mapping on the registration side is
> an improvement, and that forbidding a mapping on the registration side is
> feasible to put into place.
>
> Unfortunately, however, that principle has carried over to the lookup side,
> where changing the mapping from IDNA2003 represents a real problem for
> vendors and their users. As I said, I think we can accomodate mapping
> changes for extremely infrequent characters that appear vital for some
> communities: the ZWJ/ZWNJ (with appropriate CONTEXT restrictions because of
> the spoofing opportunities represented by these). But there is little reason
> to change the other mappings from IDNA2003, and a lot of downside to not
> having them be obligatory in the lookup transformation from text to
> punycode.
>
> Mark
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
>


More information about the Idna-update mailing list