Last call for ISO 15924-based updates

Fri Mar 13 17:30:32 CET 2009

In coding systems that get used in information technologies, there are often special usage scenarios requiring special-purpose coded entities. For instance, ISO 639 has four such special-purpose entities: mis, mul, und and zxx; ISO/IEC 10646 has many, such as U+FFFD REPLACEMENT CHARACTER (which is similar to ISO 639's und). ISO 15924 has some already: Zxxx (comparable to zxx in ISO 639), Zyyy (comparable to und) and Zzzz (comparable to mis).

Zinh is simply one more special-purpose entity, needed in certain processing contexts relevant to another ISO standard that is dependent on ISO 15924, ISO/IEC 10646.

Peter

-----Original Message-----
From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-bounces at alvestrand.no] On Behalf Of Lang Gérard
Sent: Friday, March 13, 2009 12:14 AM
To: Doug Ewell; ietf-languages at iana.org; John Cowan; Michael Everson
Subject: RE: Last call for ISO 15924-based updates

Dear John Cowan,
Dear Doug Ewell,
Dera Michael Everson,

I had no knowing about how "Zinh" works, and I thank you very much for these explanations.
 From them, I am personally continuing to think that it is not a valid entry for ISO 15924 "Code for the representation of name of scripts". 
But, if RFC 4646bis rules make mandatory that every ISO 15924 entry be tagged, then from these explanations such an entry should certainly merit a signaling comment,   and I could now agree with John Cowan's proposition.
Bien cordialement.
Gérard LANG

-----Message d'origine-----
De : ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-bounces at alvestrand.no] De la part de Doug Ewell
Envoyé : vendredi 13 mars 2009 01:46
À : ietf-languages at iana.org
Objet : Re: Last call for ISO 15924-based updates

John Cowan <cowan at ccil dot org> wrote:

> The whole point of the Zinh code is to signal that the diacritic 
> changes its script depending on the diacriticized letter.  The acute 
> accent, for example, has no script of its own; it is understood as a 
> Latin accent when placed on a Latin letter, but as a Greek accent when 
> placed on a Greek letter.

What Gérard may or may not be aware of, and what powers this entire explanation, is that in Unicode, a diacriticized letter may be represented as two encoded characters, one for the base letter and one for the diacritic.  For example, "a with acute" may be encoded as 
U+00E1, or it may be encoded as U+0061 plus U+0301.  In the second case,
the detached acute accent U+0301 would have the "inherited script" 
nature.

This is different from ISO 8859-1 and most other character encodings, where "a with acute" can only be represented as a single precomposed character; and it explains why the concept of "inherited script" exists.

--
Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14 http://www.ewellic.org http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages  ^

_______________________________________________
Ietf-languages mailing list
Ietf-languages at alvestrand.no
http://www.alvestrand.no/mailman/listinfo/ietf-languages
_______________________________________________
Ietf-languages mailing list
Ietf-languages at alvestrand.no
http://www.alvestrand.no/mailman/listinfo/ietf-languages