U-labels, NFC, and symmetry

Bjoern Hoehrmann derhoermi at gmx.net
Fri Apr 15 22:47:22 CEST 2011


* Mark Davis wrote:
>NFD is a better internal format for some forms of processing, because it has
>a smaller data-table footprint for the implementation, and is a bit faster.
>Here is a simple example for an ASCII string (on my laptop, in Java, so
>caveats apply). Where you are converting an NFD string to NFC, or vice
>versa, then the times for that conversion go up.
>
>   isNFC: 50ns
>
>toNFC: 84ns
>
>isNFD: 21ns (-57.8%)
>
>toNFD: 84ns (+0%)

The set of strings that are in NFC, and the set of strings that are in
NFD, are both regular languages, so you can implement isNFC and isNFD as
deterministic finite automata which would be pretty much optimal in in-
struction count the only performance differences would come from cache
and memory performance (you can do little tricks beyond that but you'd
be in hardware-dependent optimization land in any case), so I think the
figures are a bit misleading.

I don't think this is particularily relevant to Peter Saint-Andre's con-
cern though, if you are primarily interested in comparisons of few and
short strings that repeat a lot, as you would be when, say, routing, it
may be wiser to simply cache frequently used strings and their properly
normalized forms, or XMPP could mandate a particular form, like, always
send the U-Label, only do binary comparisons, which would remove all the
pressure from the wire protocol implementations.

I suppose it would be easier for us to make recommendations if we knew
more about the underlying performance and interoperability concerns in-
stead of discussing the properties of NFC and NFD without a good under-
standing of those concerns.
-- 
Björn Höhrmann · mailto:bjoern at hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 


More information about the Idna-update mailing list