Perl Unicode libraries (was: Re: Casefolding Sigma)

Patrik Fältström patrik at frobbit.se
Wed Jan 30 22:38:59 CET 2008


On 29 jan 2008, at 21.48, Kenneth Whistler wrote:

> I don't personally know of any problems with Unicode::Normalize
> or Unicode::UCD in Perl, but if you use those library modules,
> you need to be aware of the version-dependence and its
> relation to Unicode versions. See, for example:
>
> http://perldoc.perl.org/Unicode/Normalize.html

Correct. I have updated the stuff so things are really 5.0.0 even in  
my older version of perl.

The problem in perl is unfortunately that the representation of  
unicode codepoint is weird. Perl try to "guess" what encoding a string  
is in, and that at every calculation that is made. Just like the  
automatic casting that always happens. For example, it guesses whether  
UTF-8 or UTF-16 is used for the encoding of a string. Forcing one  
thing or another is not easy.

So, unfortunately, perl is easy to use, but unprecise.

I think I have most things under control though.

But, thanks for the pointer Ken!

    Patrik

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://www.alvestrand.no/pipermail/idna-update/attachments/20080130/2d955836/PGP.bin


More information about the Idna-update mailing list