Standards and localization (was Dot-mapping)

Tue Dec 11 23:45:49 CET 2007

I guess we could treat dots differently from upper/lower case and
NFKC, but it might complicate the specs. How about a 3-tiered
approach:

(1) protocol (John's document: no non-ASCII dots, no non-ASCII
upper-case, no NFKC)
(2) IDNA2003-like mappings (only the IDNA2003 dots, Unicode 5.0
upper-case and NFKC)
(3) UI guidelines (purely informative)

Then the HTML spec can directly or indirectly reference (2). I think
that this would be good because the URLs inside HTML are "protocol
text", while the text between <p> and </p> (paragraph) is
"non-protocol text".

Spec (2) can be written in such a way that it can evolve with Unicode
itself, similar to Patrik's document, which has rules that can be used
to generate the tables from Unicode 6.0, 7.0, etc.

Erik

On Dec 11, 2007 8:59 AM, Gervase Markham <gerv at mozilla.org> wrote:
> Erik van der Poel wrote:
> > Maybe I should not have focussed on the spoofing examples in my
> > previous email. This is not only a security issue. It is an
> > interoperability issue too. We have a number of possibilities for
> > IDNA200X:
> >
> > (1) make the mappings (dots, case, nfkc) part of the protocol
> > (2) make them a normative reference
> > (3) make them an informative reference
> > (4) don't reference them at all
>
> Although surely it's possible to make this decision differently for the
> case of dots and for everything else?
>
> As John says, dot is special, because it's the delimiter.
>
> Could IDNA200x specify a list of dot-like codepoints which MUST be
> mapped to dot, but not say anything about case and so on?
>
> I must confess that my attention to IDN topics has wandered of late, so
> in diving back in, I want to issue a pre-emptive apology if I suggest
> something which has already been rejected for good reason.
>
> Gerv
>
>