Language and script encoding standards

Fri Jul 14 17:13:37 CEST 2006

Dear list members,

Apologies in advance for this intervention from a new subscriber, but I 
am directly involved in some of these matters (details below).

I greatly appreciate Martin's distinction between types of 
transliteration -- it is very important that we have a way to indicate 
human-use transliterations for a number of reasons (I'll spare you 
specific use cases unless requested).

I would like to offer a caveat with regard to his observation concerning 
the second class of transliteration:

Martin Duerst wrote:
> b) Transliterations for computers, i.e. to get around limitations
>    in encodings or software. Beta coding is clearly such an example.
>    These will die out. 

Certainly this will eventually be the case, and some of us hope that day 
comes sooner than later; however, the fact is that there are at least 2 
factors that will delay the demise of beta code:

1. There are some well-established projects with large beta-coded text 
collections and V&V'd tooling for working with those texts (think 
search; indexing; linguistic parsing; transformation to other 
transliteration schemes, legacy font encodings and Unicode Greek of one 
or another composition flavor). There is no guarantee that these 
projects have the overhead or funding to do retrospective conversion of 
these text bases -- let alone the re-engineering of all their tooling -- 
in any kind of rapid way. There's just not that much money sloshing 
around in humanities tech.

Now, I'm willing to admit that what goes on behind the scenes in such 
systems needn't be foisted upon the wider, web-using public; however, I 
can envision scenarios in which such projects might care to share data 
with each other (e.g., via a web service) in which beta remains the 
carrier for the Greek. Would they not want a way to communicate 
explicitly about this fact?

2. There is a subset of the user base that would prefer to work with 
beta-coded Greek texts natively (I can't quantify the number, but many 
are actively productive scholars and some are fully tech-savvy). These 
are mostly people for whom one or more of the following is true:

(a) they learned to type and read beta a long time ago and are 
completely comfortable and speedily effective with it (more so than, 
say, even with a good keyboard interpreter);

(b) they work on older platforms where either it's not possible (or a 
giant pain) to do either/both Unicode input/or and display. They are not 
likely to pursue upgrades any time soon, as their funds are limited 
and/or they do not see the point: they're focused on achieving certain 
research and publishing goals and the methods they currently use are 
efficient and effective in achieving those goals. For most, if a clear 
case is made for *more* efficiency and effectiveness in achieving those 
*specific* personal goals, then the willingness to upgrade and learn new 
tricks follows as a matter of course. But this sort of narrowly-focused 
calculus can often be difficult to achieve.

I'd like to make one minor observation here as well:

> With respect to not yet encoded scripts, such as those mentioned in
> the article, there are various cases:
> - Things not really scripts, rejected (e.g. Klingon)
> - Variants that are not considered independent scripts
> - Not yet encoded scripts. In this case, it's much more
>   useful to work on documentation and stuff for getting
>   the script encoded in Unicode rather than to invent yet
>   another transliteration.

There are scholarly communities that will continue to use established 
transliteration schemes -- for various considered reasons -- regardless 
of the presence of the script-in-question's presence in Unicode. 
Etruscan is an example.

I make these comments, not in any attempt to disagree, but rather to 
communicate some of the background and facts on the ground in the hopes 
that complete and effective solutions can be found. I should also say 
that I cannot take credit for authorship of Neel's document, but I have 
been involved in some of the CHS Technical Working Group discussions 
that lie behind and around it. The issues are directly relevant to my 
work in a number of contexts, including:

http://www.unc.edu/awmc/pleiades.html
http://epidoc.sf.net

Best,
Tom

-- 
Tom Elliott, Ph.D.
Director, Pleiades Project
Ancient World Mapping Center
University of North Carolina at Chapel Hill
http://www.unc.edu/awmc/pleiades.html