What rules have been used for the current list of codepoints?
Mark Davis
mark.davis at icu-project.org
Thu Dec 14 03:10:44 CET 2006
I put the results of a generation for comparison on
http://macchiato.com/idn/UnicodePropertyResults.html
A few notes:
- We've been forgetting to remove default-ignorable-code-points, so I
added an exclusion. It only affects variation selectors.
- We probably want to remove Runic (Runr) as a historic script
[although I didn't yet.]
- I used the block notation instead of the raw ranges that Ken has.
- This is from a program I used for testing properties, so the lines
at the top, like the following, are actually executed to produce the
results:
- Let $baseId = [$gc:Lu $gc:Ll $gc:Lt $gc:Lo $gc:Lm $gc:Mc
$gc:Mn $gc:Nd]
- # this sets a variable $baseID to the union of a number of
property-based sets based on general category values.
- The ## comments outline what is done to get the different
results. I currently generate 3 lists.
- The base list, by range
- Then taking out the historic scripts and symbol ranges that
Ken recommended, by range
- A detailed version of the second, but skipping the big
alphabets.
- The output is the standard Unicode data file:
0030..0039;Zyyy #Nd[10] (0..9)
DIGIT ZERO..DIGIT NINE<range> ; <script> # <general category> [<range
count>] (<character(s)>) <name(s)>
The results is an html file, although one could dump it as text. The
characters are also shown, although you'll only see them correctly if you
have a reasonable collection of fonts. Firefox is better than IE at falling
back to whatever fonts are on your system. But it is also easy to pull into
Excel (OpenOffice) for sorting or filtering by different fields, such as the
script field or the general category field.
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20061213/d0b5fdb4/attachment.html
More information about the Idna-update
mailing list