[Suppress-Script] Initial list of 300 languages

Caoimhin O Donnaile caoimhin at smo.uhi.ac.uk
Sat Mar 11 17:41:17 CET 2006


Thanks for your reply on the question of Suppress-Script for Old, 
Middle and Modern Irish.

> 1) It is not wrong to tag a document in Latg or another script variant
> as being in Latn.

OK, but if that is the case, it maybe needs to be recorded as a rule in
general that script variants are not (very?) significant or relevant 
when it comes to deciding on Suppress-Script's? 

> 2) The overwhelming majority of the documents that are *relevant* for
> backward compatibility purposes are indeed in Latn (stricto sensu),
> since RFC 1766/3066 tags have been historically applied mostly to
> electronic documents.

Reading this and Randy's statement that 'the entire motivation
for "Suppress-Script" is compatibility with information that has already 
been tagged', makes it look as if the purpose of the Suppress-Script is
purely transitional - just for dealing with existing material in the
interim before the new matching algorithm is widely implemented,
after which it will become redundant.

If this is the case, then Suppress-Script hardly matters either way
for sga, mga or ga, because although there is lots of Irish on the
Internet, I would be very suprised if anything much of it had been
tagged with "ga" (our own pages at www.smo.uhi.ac.uk/gaeilge are
an exception), never mind ga-IE or anything else.

On the other hand, Michael's messages make it look as if
Suppress-Script is to be used to record the "default script" for
a language - i.e. that it will continue to be relevant in the future.
If this is the case then it would be:
 - completely wrong to say that Latn is the default script for
    sga  or mga;
 - correct to say that Latn is the default script for ga being
    written today (and in the last 40 years);
 - completely wrong to say that Latn is the default script for
    "Irish, 1200 to present" ("Modern Irish"), which is what
    ga represents, since mga stops at 1200.

> In particular, it is not yet possible to write Latg in plain Unicode;
> a font distinction must be made, which can be undone.  (The mere use
> of dotted letters does not make the script Latg by itself; the
> characteristic uncial/insular shapes must also be used.)

I think this is completely wrong.  Font is completely irrelevant
to script.  The "cló Gaelach" (uncial/insular font) has been widely
associated with Latg (dotted consonants), and it is what first springs
to the eye of someone who is unfamiliar with it, but it is just as
irrelevant to script as Comic-Sans versus Times-Roman would be in
English.  It is the dotted consonants which matter for linguistic
processing:  word search; spell-check; sorting order.  Books have been 
written using dotted consonants and Roman font and they would be Latg; 
lots of stuff in the cló Gaelach has 'h's instead of dotted consonants, 
and it would be Latn.  (Much more relevant would be a rather massive
spelling reform which was introduced in the 1950s, at the same time
as the cló Gaelach and dotted-consonants fell from official grace,
but I think it is the dotted-consonants rather than the spelling
which determine the Latg/Latn distinction.)

I don't know about the decision as to whether it was worth registering
Latg as a separate script variant of Latn.  I wasn't involved, and
I know very little about other scripts so I wouldn't understand the
philosophy behind script variant decisions.  I merely noticed that Latg
exists as a script code and am clarifying the situation over its use
in Irish.

I would guess that it would be used to tag texts in corpuses of Old,
Middle and "Modern" Irish.  The reader could then choose to browse and
search the corpuses using Latg (i.e. seeing dotted consonants as in the
original), or in Latn transcription (dotted consonants replaced by 'h').  
The latter might be easier on the untrained modern eye, but the former
would be more authentic, and would be more effective for searching,
since the dotted-consonant system was actually much more logical.

By the way, on the Hebrew question, am I right in thinking that
Western Hebrew (Ethnologue: yih) was normally written in Latn while
Eastern Hebrew (Ethnologue: ydd) was and is normally written in Hebr?
Code "ji" is deprecated and replaced by "yi", which equates with
Ethnologue ydd.  See: http://www.ethnologue.com/14/show_iso639.asp?code=yi
In fact, yi is already in the Language Subtag Registry with
Suppress-Script: Hebr.


