Distinguishing Greek and Greek

Wed Mar 9 21:08:04 CET 2005

On Mar 8, 2005, at 7:25 PM, John Cowan wrote:
>> This is very similar in that respect to Hans vs Hant, which is a  
>> choice of
>> which different subset of Han characters encoding in Unicode that  
>> are used
>> to represent Chinese, and the same reasoning applies.
>
> I agree with Michael that these are not separate characters at the  
> user level.
> This is an orthographical reform, not a change of script.

Hans and Hant are not a change of script, either. The two are subsets  
of Hani, and there is considerable overlap (Lee Collins estimates  
perhaps 60-70%). This is what John Jenkins, who knows quite a bit  
about both polytonic Greek and Han characters, had to say:

> I'd say the situations are analogous myself, as both arose from  
> relatively recent attempts at language reform in pretty much  
> similar ways.  Indeed, I'd say that the rationale for separating  
> polytonic and monotonic Greek is even stronger than for Hans and  
> Hant, because the two Greeks are more clearly separated than the  
> two Chineses.
>
> Yes, Hans and Hant have considerable overlap.  It's even relatively  
> simple to come up with a sentence (e.g., 他是我的朋友) where  
> you can't tell whether it's the one or the other on any basis other  
> than external tagging.  Of course, that's a mildly artificial  
> example.  In real life, you'd generally not manage to go a complete  
> sentence without finding one simplified character along the way --  
> but the point is that they have a huge overlap in Unihan.
>
> FWIW, while there are in Unihan only 2636 characters which can be  
> considered simplified forms, 1901 of those are in IICore; that is,  
> simplified forms are relatively common in actual text. And since  
> 1901 (or even 2636) characters are less than a quarter of the bare  
> minimum for modern communication, the remainder of the characters  
> needed for Hans are naturally enough characters shared with Hant.
>
> (Another measure is that of the 6763 characters from Unihan with  
> GB0 mappings, 4383 also have Big Five mappings.)

I think the argument for tagging the polytonic/monotonic distinction  
as a script subset should be examined in more detail.

Deborah Goldsmith
Internationalization, Unicode liaison
Apple Computer, Inc.
goldsmit at apple.com