[Suppress-Script] Initial list of 300 languages
Caoimhin O Donnaile
caoimhin at smo.uhi.ac.uk
Mon Mar 13 13:12:34 CET 2006
I am still struggling to track down whether the difference between
Latg and Latn is defined as purely a font difference (Cló Gaelach)
or as a character representation difference (dot-above diacritic
on the one hand, 'h'-after on the other).
Michael said:
> The reason the Latg and Latf script codes are in
> ISO 15924 is explained in ISO 15924. A library
> may hold an Irish book in the Cló Gaelach or in
> Cló Romhánach, and may use the codes to indicate
> what the book is printed in. This is a different
> order of script representation than a font
> distinction like Times/Helvetica.
This example doesn't help much in resolving the question. No Irish
Gaelic books would be republished purely because of the font difference,
nor even to change dot-above diacritics to h's or vice-versa. The
reason quite a few books from the 1950s or before were subsequently
republished was because of the rather massive spelling reform in
the 1950s, which, to confuse matters, coincided with the sudden fall
from official grace of both the Cló Gaelach font and the dot-above
diacritic. So when the books were republished, all three were changed
at once.
I have read lots of pre-1950s Irish books and have read German books
in Fraktur. The font differences are stark to someone coming new to
them and make the books look as if they are in some outlandish script,
but they are in fact totally superficial and you don't even notice them
after a few minutes reading. The dot-above diacritics instead of
h's take more getting used to, but they cause no bother either after an
hour or two of reading. The spelling reform makes a far bigger
difference than either. The main reason that books were republished
was that teachers were afraid that pupils would pick up old spellings
from them and lose marks in their exams.
> See http://www.unicode.org/iso15924
I have done, and have gained only partial enlightenment. Neither
Latg nor Latf seem to be defined anywhere, so I have had to read
between the lines of a couple of examples. The evidence is very limited
and conflicting. I get the impression, though, that both Latg and Latf
were inherited by iso15924, without further definition or clarification,
from categories used by librarians to categorise printed books based on
purely superficial differences aparent to the untrained eye; and that
the chances are that this categorisation was based on font rather than
on the presence or absence of diacritics. If this is the case, and
Latg and Latf are defined purely in terms of font rather than
characters, then they are irrelevant to modern electronic text
processing where font can be changed on the fly, and they should
either be given a health warning, "to be used only by librarians
categorising printed materials", or deprecated altogother.
Before looking at the evidence from iso15924, though, I'll look
briefly at some of the parallels and differences between the Latg
and Latf situation. The parallels are:
- Both Irish and German have traditionally been been written in
fonts (Cló Gaelach; Fraktur) which look at first sight starkly
different from Roman font.
- Both have alternative character representations, with or without
diacritics. In Irish, the lenition sound change on consonants can
be represented either by a dot-above diacritic or by a following 'h'.
In German, the umlaut sound change on vowels can be represented
either by a dieresis diacritic or by a following 'e' ("Müller"
versus "Mueller").
- In both Irish and German, the traditional font fell quickly from
grace and use in the years following the Second World War.
The differences are:
- The dot-above diacritic fell quickly from grace and use in Irish
at the same time as the font, whereas the dieresis continues to
the predominant method of representing umlaut in German.
- I have the feeling that the 'e' was used a lot less historically in
German than the 'h' was in Irish, but I may be wrong about this.
- If Latg were used to represent character differences, it would
denote the *presence* of diacritics; whereas if Latf were used
to represent character differences, it would denote the *absence*
of diacritics (since modern German with umlaut represented by
dieresis is considered to be Latn, and the 'e' is considered
to be more old-fashioned.
Turning now to look at what evidence as to the definition of Latg
(and Latf) can be gleaned from iso15924:
http://www.unicode.org/iso15924/standard/
4.2 NOTE 1 says:
"[...] ISO 15924 does not attempt to apply the character-glyph model,
because it is sometimes important to identify certain script variants
regardless of the encoding a given text may employ. [...]
Identification of such script variants, while outside the scope of
ISO/IEC 10646, is relevant to the content of script codes. For
example, a user ordering a book through interlibrary loan may prefer,
or may wish to exclude, the Gaelic variant of the Latin script for
reasons of ease of legibility or familiarity with one of the
variants."
This looks like it is saying that the definition of Latg (and Latf)
is based on fonts rather than character representations, even though,
as I have described, it is actually the spelling reform first, and
the dotted consonants second, which would matter to a reader
of Irish Gaelic, and the font itself would hardly matter at all.
Section 4.5.2 gives the following example of a bibliographic record:
Kroatisch-Deutsch und Deutsch-Kroatisch: mit einem Anhang der
wichtigeren Neubildungen des Kroatischen und Deutschen. - Berlin: Axel
Juncker, 1941. vi, 302, 314, 32 p.; 15 cm. In Croatian (Latn) and
German (Latf).
I expect that a book written in German at that time would have umlaut
represented by diacritics rather than by 'e', which would mean that
Latf must refer purely to the font and not to characters.
On the other hand, section 4.6.1 gives the following example of the use
of script codes in in html:
<META HTTP-EQUIV="Content-Language" CONTENT="ga, ru">
<META NAME="Content-Script" CONTENT="Latg, Cyrl">
This seems to me to indicate that Latg is intended to convey something
more about the content than purely font information. Since the font
for any piece of Irish, old or new, can be changed at will these days
by the browser or by stylesheets, specifying the font would be saying
nothing at all about the content.
It seems to me that the two methods of writing Irish which have
existed through the centuries, dot-above diacritics on the one
hand and 'h's on the other, do indeed parallel in a small way the
the different script methods of writing languages such as Turkish
or Azerbaijani or Korean (although I confess to knowing little
in detail about these). If Latg does indicate Irish written
with dot-above diacritic, then it could indeed be a useful way of
tagging Irish content, provided that its use does not cause any other
undue complications. However, if, as looks increasingly likely,
and as John and Kent have told me, Latg refers purely to font,
then it seems to me to be a completely useless code which might
as well be forgotten.
Caoimhín
More information about the Ietf-languages
mailing list