Language and script encoding standards
thomase at email.unc.edu
Fri Jul 14 17:13:37 CEST 2006
Dear list members,
Apologies in advance for this intervention from a new subscriber, but I
am directly involved in some of these matters (details below).
I greatly appreciate Martin's distinction between types of
transliteration -- it is very important that we have a way to indicate
human-use transliterations for a number of reasons (I'll spare you
specific use cases unless requested).
I would like to offer a caveat with regard to his observation concerning
the second class of transliteration:
Martin Duerst wrote:
> b) Transliterations for computers, i.e. to get around limitations
> in encodings or software. Beta coding is clearly such an example.
> These will die out.
Certainly this will eventually be the case, and some of us hope that day
comes sooner than later; however, the fact is that there are at least 2
factors that will delay the demise of beta code:
1. There are some well-established projects with large beta-coded text
collections and V&V'd tooling for working with those texts (think
search; indexing; linguistic parsing; transformation to other
transliteration schemes, legacy font encodings and Unicode Greek of one
or another composition flavor). There is no guarantee that these
projects have the overhead or funding to do retrospective conversion of
these text bases -- let alone the re-engineering of all their tooling --
in any kind of rapid way. There's just not that much money sloshing
around in humanities tech.
Now, I'm willing to admit that what goes on behind the scenes in such
systems needn't be foisted upon the wider, web-using public; however, I
can envision scenarios in which such projects might care to share data
with each other (e.g., via a web service) in which beta remains the
carrier for the Greek. Would they not want a way to communicate
explicitly about this fact?
2. There is a subset of the user base that would prefer to work with
beta-coded Greek texts natively (I can't quantify the number, but many
are actively productive scholars and some are fully tech-savvy). These
are mostly people for whom one or more of the following is true:
(a) they learned to type and read beta a long time ago and are
completely comfortable and speedily effective with it (more so than,
say, even with a good keyboard interpreter);
(b) they work on older platforms where either it's not possible (or a
giant pain) to do either/both Unicode input/or and display. They are not
likely to pursue upgrades any time soon, as their funds are limited
and/or they do not see the point: they're focused on achieving certain
research and publishing goals and the methods they currently use are
efficient and effective in achieving those goals. For most, if a clear
case is made for *more* efficiency and effectiveness in achieving those
*specific* personal goals, then the willingness to upgrade and learn new
tricks follows as a matter of course. But this sort of narrowly-focused
calculus can often be difficult to achieve.
I'd like to make one minor observation here as well:
> With respect to not yet encoded scripts, such as those mentioned in
> the article, there are various cases:
> - Things not really scripts, rejected (e.g. Klingon)
> - Variants that are not considered independent scripts
> - Not yet encoded scripts. In this case, it's much more
> useful to work on documentation and stuff for getting
> the script encoded in Unicode rather than to invent yet
> another transliteration.
There are scholarly communities that will continue to use established
transliteration schemes -- for various considered reasons -- regardless
of the presence of the script-in-question's presence in Unicode.
Etruscan is an example.
I make these comments, not in any attempt to disagree, but rather to
communicate some of the background and facts on the ground in the hopes
that complete and effective solutions can be found. I should also say
that I cannot take credit for authorship of Neel's document, but I have
been involved in some of the CHS Technical Working Group discussions
that lie behind and around it. The issues are directly relevant to my
work in a number of contexts, including:
Tom Elliott, Ph.D.
Director, Pleiades Project
Ancient World Mapping Center
University of North Carolina at Chapel Hill
More information about the Ietf-languages