Mixing scripts (Re: Unicode versions (Re: Criteria for
exceptional characters))
Harald Alvestrand
harald at alvestrand.no
Wed Dec 20 00:13:56 CET 2006
--On 19. desember 2006 14:04 -0800 Mark Davis <mark.davis at icu-project.org>
wrote:
>
>
>> I take it this means the answer to my question is "no", since the script
>> names in Scripts.txt and the ISO 15924 codes don't match up.
> Each Unicode property name, and property value name may have aliases.
> These aliases, as you would expect, are encapsulated in a
> machine-readable file, such as
> http://www.unicode.org/Public/UNIDATA/PropertyValueAliases.txt
>
> So, for example, you see there:
>
> sc ; Arab ; Arabic
> sc ; Armn ; Armenian
> sc ; Bali ; Balinese
> sc ; Beng ; Bengali
> ...
>
> The first field, sc, is the short name for the "script" property; Armn is
> the short name for one of its values (which corresponds to the 15924
> code), and Armenian is the long name used in the data file Script.txt. If
> you look at the site for the 15924 Registration Authority
> (http://www.unicode.org/iso15924/), you'll find also in the tables such
> as http://www.unicode.org/iso15924/iso15924-codes.html a listing of both
> the long and short value names.
I have counted these fields. There are 64 of them in the
PropertyValueAliases file.
I have also counted the number of codes in the 15924 page. There are approx
120 of them.
There may be entries that match. The tables don't.
BTW:
>>> Is there a list of the Unicode codepoints known to be used in each of
>>> the ISO 15924 script codes?
>>
>>
>> The closest you are going to get to an repertoire partitioning
>> of Unicode into scripts is Scripts.txt, the very file we have
>> been talking about and using for the development of the
>> inclusions file.
I was not asking for a partitioning.
Harald
More information about the Idna-update
mailing list