Mixing scripts (Re: Unicode versions (Re: Criteria for exceptional characters))

Harald Alvestrand harald at alvestrand.no
Wed Dec 20 00:13:56 CET 2006



--On 19. desember 2006 14:04 -0800 Mark Davis <mark.davis at icu-project.org> 
wrote:

>
>
>> I take it this means the answer to my question is "no", since the script
>> names in Scripts.txt and the ISO 15924 codes don't match up.

> Each Unicode property name, and property value name may have aliases.
> These aliases, as you would expect, are encapsulated in a
> machine-readable file, such as
> http://www.unicode.org/Public/UNIDATA/PropertyValueAliases.txt
>
> So, for example, you see there:
>
> sc ; Arab      ; Arabic
> sc ; Armn      ; Armenian
> sc ; Bali      ; Balinese
> sc ; Beng      ; Bengali
> ...
>
> The first field, sc, is the short name for the "script" property; Armn is
> the short name for one of its values (which corresponds to the 15924
> code), and Armenian is the long name used in the data file Script.txt. If
> you look at the site for the 15924 Registration Authority
> (http://www.unicode.org/iso15924/), you'll find also in the tables such
> as http://www.unicode.org/iso15924/iso15924-codes.html a listing of both
> the long and short value names.

I have counted these fields. There are 64 of them in the 
PropertyValueAliases file.

I have also counted the number of codes in the 15924 page. There are approx 
120 of them.

There may be entries that match. The tables don't.

BTW:

>>> Is there a list of the Unicode codepoints known to be used in each of
>>> the ISO 15924 script codes?
>>
>>
>> The closest you are going to get to an repertoire partitioning
>> of Unicode into scripts is Scripts.txt, the very file we have
>> been talking about and using for the development of the
>> inclusions file.

I was not asking for a partitioning.

                       Harald





More information about the Idna-update mailing list