NFKC and dots
Erik van der Poel
erikv at google.com
Sun Mar 2 17:50:05 CET 2008
Thanks for the info. After I sent that email, I discussed it with some
of the ICU folks, and they also said that one way to do this would be
to perform NFKC on the entire domain name before splitting it into
labels. Mark's pre-processing draft says something similar:
Actually, I've been meaning to gather folks who are interested in HTML
and IDNA so that we can discuss this pre-processing spec. However, I
do not want to distract the nascent working group, which probably
wants to focus on the on-the-wire specs (IDNA200X, 4 drafts: issues,
protocol, tables and bidi).
Should we set up a separate mailing list for those that wish to
discuss HTML and IDNA?
On Mon, Jan 7, 2008 at 10:49 AM, Shawn Steele
<Shawn.Steele at microsoft.com> wrote:
> I haven't been paying much attention to this alias, so I may have missed some background information.
> Erik said:
> > Speaking of U+2024 and where in the protocol stack to handle things, I
> > just discovered that MSIE 7 and Firefox 2 both perform NFKC on this
> > character, to yield U+002E (.). After that, they divide the host name
> > into labels *again*, so the new U+002E becomes a new label separator.
> Actually that's not quite right. The Windows APIs, IE7 & .Net use a special "normalization" to do the NFKC, stringprep, etc. all at once on the entire name. If the ASCII function was called, we do the ACE encoding and check the labels at that time.
> Since I haven't been following the whole discussion, its unclear to me what expected from U+2024, clearly U+002E can't be part of a label so our behavior (& Firefox's I guess) seems sensible.
> Since our API returns a full string, the caller still has to figure out what the labels are if they care. In most cases I think we just assume that the whole string'll be passed to the DNS system to figure out.
> - Shawn
> Idna-update mailing list
> Idna-update at alvestrand.no
More information about the Idna-update