[Suppress-Script] Initial list of 300 languages

Caoimhin O Donnaile caoimhin at smo.uhi.ac.uk
Mon Mar 13 13:12:34 CET 2006


I am still struggling to track down whether the difference between
Latg and Latn is defined as purely a font difference (Cló Gaelach)
or as a character representation difference (dot-above diacritic
on the one hand, 'h'-after on the other).

Michael said:

> The reason the Latg and Latf script codes are in 
> ISO 15924 is explained in ISO 15924. A library 
> may hold an Irish book in the Cló Gaelach or in 
> Cló Romhánach, and may use the codes to indicate 
> what the book is printed in. This is a different 
> order of script representation than a font 
> distinction like Times/Helvetica.

This example doesn't help much in resolving the question.  No Irish
Gaelic books would be republished purely because of the font difference,
nor even to change dot-above diacritics to h's or vice-versa.  The
reason quite a few books from the 1950s or before were subsequently
republished was because of the rather massive spelling reform in
the 1950s, which, to confuse matters, coincided with the sudden fall 
from official grace of both the Cló Gaelach font and the dot-above 
diacritic.  So when the books were republished, all three were changed
at once.

I have read lots of pre-1950s Irish books and have read German books
in Fraktur.  The font differences are stark to someone coming new to
them and make the books look as if they are in some outlandish script,
but they are in fact totally superficial and you don't even notice them
after a few minutes reading.  The dot-above diacritics instead of
h's take more getting used to, but they cause no bother either after an
hour or two of reading.  The spelling reform makes a far bigger 
difference than either.  The main reason that books were republished
was that teachers were afraid that pupils would pick up old spellings
from them and lose marks in their exams.

> See http://www.unicode.org/iso15924

I have done, and have gained only partial enlightenment.  Neither
Latg nor Latf seem to be defined anywhere, so I have had to read
between the lines of a couple of examples.  The evidence is very limited 
and conflicting.  I get the impression, though, that both Latg and Latf
were inherited by iso15924, without further definition or clarification, 
from categories used by librarians to categorise printed books based on 
purely superficial differences aparent to the untrained eye; and that 
the chances are that this categorisation was based on font rather than
on the presence or absence of diacritics.  If this is the case, and
Latg and Latf are defined purely in terms of font rather than 
characters, then they are irrelevant to modern electronic text 
processing where font can be changed on the fly, and they should
either be given a health warning, "to be used only by librarians
categorising printed materials", or deprecated altogother.

Before looking at the evidence from iso15924, though, I'll look
briefly at some of the parallels and differences between the Latg
and Latf situation.  The parallels are:
 - Both Irish and German have traditionally been been written in
    fonts (Cló Gaelach; Fraktur) which look at first sight starkly 
    different from Roman font.
 - Both have alternative character representations, with or without
    diacritics.  In Irish, the lenition sound change on consonants can
    be represented either by a dot-above diacritic or by a following 'h'.
    In German, the umlaut sound change on vowels can be represented
    either by a dieresis diacritic or by a following 'e' ("Müller"
    versus "Mueller").
 - In both Irish and German, the traditional font fell quickly from
    grace and use in the years following the Second World War.

The differences are:
 - The dot-above diacritic fell quickly from grace and use in Irish
    at the same time as the font, whereas the dieresis continues to
    the predominant method of representing umlaut in German.
 - I have the feeling that the 'e' was used a lot less historically in 
    German than the 'h' was in Irish, but I may be wrong about this.
 - If Latg were used to represent character differences, it would
    denote the *presence* of diacritics; whereas if Latf were used
    to represent character differences, it would denote the *absence*
    of diacritics (since modern German with umlaut represented by
    dieresis is considered to be Latn, and the 'e' is considered
    to be more old-fashioned.

Turning now to look at what evidence as to the definition of Latg
(and Latf) can be gleaned from iso15924:
http://www.unicode.org/iso15924/standard/
4.2 NOTE 1 says:

 "[...] ISO 15924 does not attempt to apply the character-glyph model,
  because it is sometimes important to identify certain script variants
  regardless of the encoding a given text may employ. [...] 
  Identification of such script variants, while outside the scope of 
  ISO/IEC 10646, is relevant to the content of script codes. For 
  example, a user ordering a book through interlibrary loan may prefer, 
  or may wish to exclude, the Gaelic variant of the Latin script for 
  reasons of ease of legibility or familiarity with one of the 
  variants."

This looks like it is saying that the definition of Latg (and Latf)
is based on fonts rather than character representations, even though,
as I have described, it is actually the spelling reform first, and
the dotted consonants second, which would matter to a reader
of Irish Gaelic, and the font itself would hardly matter at all.

Section 4.5.2 gives the following example of a bibliographic record:

  Kroatisch-Deutsch und Deutsch-Kroatisch: mit einem Anhang der 
  wichtigeren Neubildungen des Kroatischen und Deutschen. - Berlin: Axel 
  Juncker, 1941. vi, 302, 314, 32 p.; 15 cm. In Croatian (Latn) and 
  German (Latf).

I expect that a book written in German at that time would have umlaut
represented by diacritics rather than by 'e', which would mean that
Latf must refer purely to the font and not to characters.

On the other hand, section 4.6.1 gives the following example of the use 
of script codes in in html:

  <META HTTP-EQUIV="Content-Language" CONTENT="ga, ru">
  <META NAME="Content-Script" CONTENT="Latg, Cyrl">

This seems to me to indicate that Latg is intended to convey something
more about the content than purely font information.  Since the font
for any piece of Irish, old or new, can be changed at will these days
by the browser or by stylesheets, specifying the font would be saying
nothing at all about the content.


It seems to me that the two methods of writing Irish which have
existed through the centuries, dot-above diacritics on the one
hand and 'h's on the other, do indeed parallel in a small way the
the different script methods of writing languages such as Turkish
or Azerbaijani or Korean (although I confess to knowing little
in detail about these).  If Latg does indicate Irish written
with dot-above diacritic, then it could indeed be a useful way of 
tagging Irish content, provided that its use does not cause any other
undue complications.  However, if, as looks increasingly likely,
and as John and Kent have told me, Latg refers purely to font,
then it seems to me to be a completely useless code which might
as well be forgotten.

Caoimhín


More information about the Ietf-languages mailing list