U-labels, NFC, and symmetry
Mark Davis ☕
mark at macchiato.com
Fri Apr 15 21:35:52 CEST 2011
> However, I don't understand your argument about comparisons.
> You need to compare and need to compare frequently. Both NFD
> and NDC are canonical forms. Comparing a pair of NFC-strings is
> no more expensive or complex than comparing a pair of
> NFD-strings (actually, because of the length issue, the NFC
> comparisons might be a tad cheaper, but the difference is
> presumably similar).
That is correct. The only difference would be taking externally supplied
strings and either testing them for being in a particular normalization
form, or converting them into a particular normalization form.
In general, I agree that NFC is a better format in terms of compatibility.
NFD is a better internal format for some forms of processing, because it has
a smaller data-table footprint for the implementation, and is a bit faster.
Here is a simple example for an ASCII string (on my laptop, in Java, so
caveats apply). Where you are converting an NFD string to NFC, or vice
versa, then the times for that conversion go up.
isNFD: 21ns (-57.8%)
toNFD: 84ns (+0%)
If you have a stored string in NFD, an
> input string needs to be converted to NFD to be compared with
> it. If you have a stored string in NFC, an input string needs to
> be converted to NFC to be compared with it. No significant
> difference there. I do remember Mark Davis telling the IDNABIS
> WG that testing for NFC conformance was appreciably less costly
> than converting to NFC. I don't know if the same relationship
> would hold for NFD but, again, if the string to be compared to a
> stored one comes from a keyboard, it is more likely to be in NFC
> form than in NFD form (if an operating system decides to
> normalize keyboard input before delivering it to an application,
> all bets are off).
> So, while at least some of the particular concerns that drove
> the NFC decision for IDNA don't apply to XMPP, it still seems to
> me that you haven't made any real case for NFD rather than NFC.
> If the choice really is arbitrary, then being different is not
> an advantage.
> Idna-update mailing list
> Idna-update at alvestrand.no
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Idna-update