Mappings - some examples

Erik van der Poel erikv at google.com
Wed Dec 2 06:31:54 CET 2009


Hello Georg,

Pagerank is computed from a link graph, and each link is represented
by a single directed edge. We do not want to add complexity by making
a single link yield two or more directed edges. By the way, you say
"both possibly meant documents", but a single domain name is made up
of labels owned by different owners, so if there is an Eszett in one
label and another Eszett in another label, you potentially have to try
four different variants.

On the serving side, we cannot serve Punycode that encodes Eszett
because MSIE 7 and 8 reject such URLs. So we'd have to do something
special, such as warn the user, remove the link, or whatever.

By the way, we wouldn't make such changes just because some
IETF/W3C/Unicode document recommended it. We only emulate the major
browser(s), and only when they gain market share and stick to the same
behavior for a long time. For example, if MSIE 9 decided to change the
way links are followed (resolved) and then they gained market share
but decided to deploy a patch that reverts to the old link behavior,
we would not implement the new (abandoned) link behavior.

Erik

On Tue, Dec 1, 2009 at 7:50 PM, Georg Ochsner <g.ochsner at revolistic.com> wrote:
> Am I understanding right, that this would lead to inaccurate scoring
> (pageranks, link popularity etc.)? But it would not lead to sites not being
> indexed (you can index both possibly meant documents) and also not to links
> on the result pages that lead different users to different documents,
> because the links are in punycode anyway?
>
>
>
> Best
> Georg
>
>
>
>
>
> Von: mark.edward.davis at gmail.com [mailto:mark.edward.davis at gmail.com] Im
> Auftrag von Mark Davis ?
> Gesendet: Mittwoch, 2. Dezember 2009 01:47
>
> An: Georg Ochsner
> Cc: Alexander Mayrhofer; Patrik Fältström; Michael Everson; IDNA update
> work; Andreas Stötzner
> Betreff: Re: Mappings - some examples
>
>
>
> When you build an index, you follow links as if the user clicked on them.
> When the choice is determinant, this is not a problem. So consider
> href='schloß.de'? If all browsers go to schloss.de (or fail), your choice is
> easy. That is the current situation. During the transition period (of
> years), the choice is hard, because the answer is not determinant.
>
> Mark
>
> On Tue, Dec 1, 2009 at 12:22, Georg Ochsner <g.ochsner at revolistic.com>
> wrote:
>
> Hello Mark,
>
>
>
> I don’t understand why it would cause problems for search engines. Which
> would they be more in detail? In the organic Google results you can link to
> the punycode version for the new ß hosts. And you can still link to ASCII ss
> domains as well. I think this is very clean actually, really everybody will
> get to the same hosts from the same search results. Or is it about crawling
> or duplicate content, what you mean?
>
>
>
> Thanks
>
> Georg
>
>
>
>
>
>
>
> Von: mark.edward.davis at gmail.com [mailto:mark.edward.davis at gmail.com] Im
> Auftrag von Mark Davis ?
> Gesendet: Dienstag, 1. Dezember 2009 21:03
>
> An: Georg Ochsner
> Cc: Alexander Mayrhofer; Patrik Fältström; Michael Everson; IDNA update
> work; Andreas Stötzner
> Betreff: Re: Mappings - some examples
>
>
>
> I think this is somewhat like the TRANSITIONAL strategy that I suggested.
> The intermediate browser feature you suggest wouldn't work, however.
>
> A UI would be really ugly in practice. The venders have found that simple
> icons don't alert people - what you have to do is go to a special
> intermediate page with strong warnings, and have a Continue and Back button.
>
> If you are on an old browser, of course, you'd still go to 'ss'.
>
> And it won't work with search engines (we don't have little men in our
> servers at Google who can decide what's real and what's not.
>
> And it won't work for email (without some really ugly bounceback).
>
> Mark
>
> On Tue, Dec 1, 2009 at 06:45, Georg Ochsner <g.ochsner at revolistic.com>
> wrote:
>
> Hello Mark,
>
> what about a fourth scenario?
>
> 1. IDNA2008 clearly says that ß is PVALID an shall not be mapped to ss
> anymore.
> 2. Browsers get an additional feature. It informs users when typing a
> hostname with ß that previously they would have been redirected to the ss
> sibling. And that they now need to use ss in order to get to the old
> destination.
>
> 3. Registries wait until IDNA2008 is spread, let’s say more widely than
> IDNA2003 is until now. (That is 6 years and with still xx% IE6s out there)
> 4. Registries inform the registrants and offer a grandfathering registration
> (like the .mx registry recently did) that allows all registrants of domain
> names with ss to register the ß sibling if they like.
> 5. Registration is opened to the public and people realize that "machines"
> now distinguish ß from ss like humans do.
>
> Please add more useful steps between 1. and 5. :-)
>
> Best regards
> Georg
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: mark.edward.davis at gmail.com [mailto:mark.edward.davis at gmail.com] Im
> Auftrag von Mark Davis ?
> Gesendet: Montag, 30. November 2009 19:01
> An: Georg Ochsner
> Cc: Alexander Mayrhofer; Patrik Fältström; Michael Everson; IDNA update
> work; Andreas Stötzner
> Betreff: Re: Mappings - some examples
>
> Even for English, IDNA does not permit all valid words: "Joe's Bar" will not
> work, because of both the space and the apostrophe and "Joe'sBar" because of
> the apostrophe. IDNA2008 explicitly does not permit all sequences that would
> be valid words in all languages; nor could it do otherwise.
>
> There are two compatibility problems:
>
> 1. Existing web pages and other documents that contain ß and expect to go to
> location X and not Y.
> 2. There is still an existing body of billions of browsers that will take
> years to disappear (as Erik points out, something like 20% of browsers are
> still IE6).
>
> Let's suppose that IDNA2008 allows ß, and that the newer browsers use it
> (and not a compatibility scheme like http://unicode.org/reports/tr46/). The
> only real purpose to allowing ß is so that you can distinguish from ss. But
> what happens when Herr Stosser gets stosser.at and Herr Stoßer gets
> stoßer.at?
>
> Initially, 100% of all browsers will go to the ss form. Nobody will go to
> Stoßer's site; his email won't work, etc.: href="stoßer.at" goes directly to
> "stosser.at". Even on Stoßer's site, absolute intrapage links will go to the
> 'wrong' place. After a while, some newer browsers will take page
> href="stoßer.at" and go to stoßer.at, while all the other browsers will go
> to href="stosser.at". The same goes for email. So access to all of those
> links (and mail, etc.) will be unreliable, and the subject of security and
> compatibility problems. As a result, practically, people would be unable to
> use href="stoßer.at" in their web pages or in email until essentially all
> existing browsers were supplanted, which will be maybe 5 years down the
> line. And during that time, these will also bollux up all the search
> engines, since indexing assumes that links don't have ambiguous targets.
>
> And that assumes that nobody does use a compatibility mechanism. Another,
> more likely, alternative based on our conversations with vendors is that
> people disregard that part of IDNA2008, and use a compatibility mechanism
> like http://unicode.org/reports/tr46/; that allows the browsers, emailers,
> search engines and others to keep functioning correctly - there is no
> transition. The downside is that you can't register both stosser.at and
> stoßer.at. So of the thousand Herr Stoßers, instead of one of them getting
> that name, none of them do; it's like "Joe'sBar". And, of course, there is
> the disadvantage of having UTS46 have to exist in the first place.
>
> The third alternative is the really awful one: the "let a thousand flowers
> bloom" scenario. In this scenario, there is no single compatibility
> mechanism like http://unicode.org/reports/tr46/; instead, we get different
> variants from different vendors, and the situation is chaotic. That is the
> scenario that the Unicode consortium's members are really concerned with.
> While the ideal solution would be an IDNA2008 that maintained compatibility
> with IDNA2003, the second best solution is only one compatibility mechanism;
> not dozens.
>
> Mark
>
> On Mon, Nov 30, 2009 at 07:55, Georg Ochsner <g.ochsner at revolistic.com>
> wrote:
>> -----Ursprüngliche Nachricht-----
>> Von: idna-update-bounces at alvestrand.no [mailto:idna-update-
>> bounces at alvestrand.no] Im Auftrag von Alexander Mayrhofer
>> Gesendet: Montag, 30. November 2009 15:57
>
>> > I would though be more "on your side" if the number of domain
>> > names that contained ß where say 100 times higher than today
>> > in published documents. Because then people would be TOLD to
>> > type in something (ß) that mapped to something else (ss) that
>> > was registered. That, I claim, is not the case. At least not
>> > "heavily".
>>
>> I understand that. And i'm saying that the potential of around 500
>> useful "ß" registrations (based on looking through our inventory of 900k
>> domains) is by far not worth the effort.
> In several talks with people from the Austrian registry I've now heard this
> argument. But I think this decision should not be made depending on
> commercial factors. IDNA is in my eyes not a question of return on
> investment but about the native use of language in domain names around the
> globe. Once more, the Austrian registry can still refuse to have sharp s
> registered within their namespace, but maybe other registries pay more
> attention to the language aspect than to commercial calculations.
>
> Best regards
> Georg
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
>
>
>
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
>


More information about the Idna-update mailing list