Mixing scripts (Re: Unicode versions (Re: Criteria for exceptional characters))

Mark Davis mark.davis at icu-project.org
Tue Dec 19 23:04:36 CET 2006


> I take it this means the answer to my question is "no", since the script
> names in Scripts.txt and the ISO 15924 codes don't match up.


We need to drag you, kicking and screaming, into ever deeper understanding
of how Unicode works.

Each Unicode property name, and property value name may have aliases. These
aliases, as you would expect, are encapsulated in a machine-readable file,
such as http://www.unicode.org/Public/UNIDATA/PropertyValueAliases.txt

So, for example, you see there:

sc ; Arab      ; Arabic
sc ; Armn      ; Armenian
sc ; Bali      ; Balinese
sc ; Beng      ; Bengali
...

The first field, sc, is the short name for the "script" property; Armn is
the short name for one of its values (which corresponds to the 15924 code),
and Armenian is the long name used in the data file Script.txt. If you look
at the site for the 15924 Registration Authority (
http://www.unicode.org/iso15924/), you'll find also in the tables such as
http://www.unicode.org/iso15924/iso15924-codes.html a listing of both the
long and short value names.

The Unicode script property (2001-02-06) actually predated first publication
of ISO 15924 (2004-01-09), however, it was done in the knowledge that 15924
was coming, and they have been kept in sync since.

Mark

On 12/19/06, Harald Alvestrand <harald at alvestrand.no> wrote:
>
> Thanks for pointing out the relevant TR for the use of script codes, and
> the special status of "Common" and "Inherited". The algorithm grows....
>
> --On 19. desember 2006 12:45 -0800 Kenneth Whistler <kenw at sybase.com>
> wrote:
>
> >> Is there a list of the Unicode codepoints known to be used in each of
> >> the ISO 15924 script codes?
> >
> > That is an ill-formed question. ISO 15924 defines script codes.
> > It does not define repertoires or associate code points with
> > those script codes. So you can't have sets of Unicode code points
> > "in each ISO 15924 script code".
> >
> > The closest you are going to get to an repertoire partitioning
> > of Unicode into scripts is Scripts.txt, the very file we have
> > been talking about and using for the development of the
> > inclusions file.
>
> I take it this means the answer to my question is "no", since the script
> names in Scripts.txt and the ISO 15924 codes don't match up.
>
>             Harald
>
>
>
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20061219/3e817d4b/attachment-0001.html


More information about the Idna-update mailing list