Suggestion: Tag or Sub- tag for Scientific names

Peter_Constable at Peter_Constable at
Mon Feb 3 10:07:28 CET 2003

On 02/03/2003 06:55:35 AM "Jon Hanna" wrote:

>such we have 3 options:

Martin has pointed out a fourth. Summarising:

1. scientific term is not distinguished from lang of matrix text

2. scientific term is tagged as Latin

3. scientific term is tagged as something distinct (a variant of Latin,
"la-sci", or something unique, e.g. "scientrm")

4. the scientific term is tagged in a special way to indicate it is in no
particular language, e.g. xml:lang="" or perhaps a tag for "indeterminate

Of course, there will inevitably be lots of text out there in which 1 is
done, even if that is not considered the best practice. On the other hand,
it probably is not the best practice for reasons already observed: e.g. an
English (or French or Italian...) spell checker (or speech synthesiser...)
is very unlikely to know how to handle that text. This is also an issue for
option 2.

There have been objections that what we're talking about is not a language
-- it doesn't have verbs, etc. I'm wondering if that isn't a red herring.
Let's consider what the purpose of the tags is: they are used to facilitate
cataloguing and retrieval of content in terms of linguistic varietites, and
they permit tailoring of processes according to linguistic varieties. IOW,
I'm suggesting that perhaps it's distinct IT processing needs that matter
more than whether it's a bona fide language. With that in mind, let me ask
a couple of questions:

- Might there be a user need to search for content of this linguistic type,
i.e. retrieving content by reference to this (presumed) linguistic variety?

- Is tailoring of processes such as spell checking or speech synthesis
required for the terminology in question?

With regard to the second question, I think it's pretty clear that
processes tailored for undisputed natural languages (English, Latin, etc.)
are not going to be able to handle these terms. That may or may not be a
problem: Can we live with "Heerz lukenatcha" showing up as a misspelling by
a Spanish (or whatever) spell-checker? Can we live with a Russian speech
synthesiser generating a bad attempt at pronouncing it appropriately? If
these are acceptable errors, then either option 1, option 2 or option 4
(assuming we work out what should happen under various processes when
encountering text tagged as xml:lang="" or something comparable) would be
acceptable. If that's not good enough, though, then option 3 seems to be

