Unicode properties

Mark Davis mark.davis at icu-project.org
Tue Jan 15 23:49:46 CET 2008


Thanks, I hope so.

All,

I updated it a bit to add some other functions that should be useful for
this group (and others). In particular, you can now use regex in the
property values. So the following gives you all the characters X where
toNFKC(X) contain an ASCII period (a topic of recent interest).

http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:toNFKC=/\./:]

For more information, please read
http://unicode.org/cldr/utility/index.jsp. I'd welcome any information
feedback on improving the clarity of that
explanation and/or useful further enhancements.

Mark

On Jan 12, 2008 1:23 AM, Harald Alvestrand <harald at alvestrand.no> wrote:

> Mark Davis skrev:
> > BTW, over the holidays I updated http://unicode.org/cldr/utility/ so
> > that it now has the idna2003 properties. For example:
> >
> > |[:idna=output:]|
> > <
> http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B:idna=output:%5D>
> > The set of all characters allowed in the output of IDNA.
> >
> > These can be used for quick comparisons of the idna bis tables. For
> > example, to get the idna characters that were allowed, but would be
> > excluded by various of the idnabis table steps, one can go to
> > http://unicode.org/cldr/utility/list-unicodeset.jsp and put in the
> > regex-like expression:
> >
> > [[:idna=output:]
> >   -[[:L:][:Nd:][:Mn:][:Mc:]
> >   -[:^isCaseFolded:]
> >   -[:NFKC_QuickCheck=NO:]
> >   -[:Default_Ignorable_Code_Point:]]
> >  -[-A-Z]]
> >
> > (Perl syntax works if you like that better.)
> >
> > Comments are welcome.
> This seems very useful - it allows us very quickly to see which
> characters we're talking about.
>
> Thanks!
>
>


-- 
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20080115/4bc277da/attachment.html


More information about the Idna-update mailing list