Language for taxonomic names, redux

Fri Feb 24 19:45:07 CET 2017

On 24 February 2017 at 09:59, Arthur Reutenauer
<arthur.reutenauer at normalesup.org> wrote:

>   I think it would be useful if, instead of replying to Michael
> tit-for-tat,

He seems to insists that I should reply to his every question, or
opinion, even when others have already done so (and even sometimes
twice in the same email), and that he is the (only) person that I have
to persuade. I've been finding the whole process increasingly
confusing.

> you would list actual use cases where a subtag would help.

I thought I had done so. But, for the avoidance of doubt...

In 2003, I wrote:

     There is currently no language tag to denote the use of
     the scientific names (often erroneously called "Latin
     names") of living things, such as plants and animals
     (e.g. Homo sapiens). While such names are often composed
     of, or derived from, Latin terms, they can also be
     created from "Latinised" words taken from other
     languages, including Greek, English & other Western
     languages, languages local to the habitat of the plant
     or creature described, place names, word- play, family
     names and even words invented for fiction (e.g.
     characters in Tolkien or Star Trek).

      I propose a tag for such names (which commonly occur in
      the midst of prose written in another language), or,
      alternatively, a sub- tag of the "LA" tag.

      The tag will allow clients to be aware that they should
      NOT translate Scientific names when translating the text
      of a document in which they are included; Homo sapiens
      is Homo sapiens in French, German, English or Serbo-
      Croat.

      There is convention to abbreviate second occurrences of
      such names thus:

            "Homo sapiens has a bigger brain that H.
            erectus"

        and that the proposed tag (or sub-tag) will potentially
        allow the second such occurrence to be pronounced in
        full by speech synthesis software, as it would be in
        normal speech:

              "Homo sapiens has a bigger brain that Homo
              erectus"

        Scientific names are conventionally rendered, on paper
        or screen, in italics (or sometimes underlined);
        a unique tag will potentially allow rendering to be
        facilitated automatically by clients (or via style
        sheets in HTML and other mark- up schema).

In my 2008 blog post, I asked:

        what happens when a page like this one includes the scientific
        (or taxonomic) name of a living thing, such as Circus cyaneus
        (the Hen Harrier)? It’s not English, and should not be translated,
        into, say, German, as Zirkus cyaneus.

On 21 February on this lists, I said:

       I've set out several additional use-cases previously, including:

          * Pronunciation by aural browsers/ assistive technologies
          * Selection for styling by CSS

       That's in addition to the do-not-translate use-case which we were
       discussing.

and, more recently:

      The need is to have a standard method of marking up the language
      of content representing taxonomic names.

Another, secondary, use-case that now occurs to me is to facilitate
machine scraping of documents: "find all the taxon names being
discussed and discard the rest".

Let me now also ask this, so that I have a clearer idea of what this
forum considers for a good use-case: What are the use-cases for, say,
en or en-GB? And which of them do not apply to the proposal under
discussion?

> Wikipedia is one of them, and your position, if I’ve understood it
> correctly, is: the subtag can be rolled out instantly in infoboxes by
> changing the template, and it would be up to Wikipedia contributors to
> use it in running text, according to Wikimedia guidelines and
> long-standing practice.

Yes.

> That’s one example, and nobody denies that it
> is a legitimate one.

My reading is that some here do deny that.

>   We need more.  You’ve referred several times to earlier discussions,
> some of which contain concrete suggestions, but if you want to convince
> the contributors to this list you’ll need to actually write them down
> instead of just giving pointers.

I think l've generally been referring to comments made earlier in
/this/ discussion; some now quoted above.

> This is not a peer-reviewed article,
> nobody is going to check your references, and besides your emails don’t
> come with a bibliography at the bottom.  In any case, if all you write
> is “see the points made by ...” said points are not actually part of the
> discussion.

You may be referring to my reference to points made by Gregor Hagedorn in:

   http://mailman.nhm.ku.edu/pipermail/taxacom/2008-June/027271.html

That's a 500+ word post, even without the follow-up discussion. I can
re-post it here, but is that really desirable?

> It would also be useful if you acknowledged that the system for which
> you wish to have a subtag is not in any way a language, variety, or
> dialect.  It has no syntax, no parts of speech except for nouns and
> adjectives, no autonomous pronunciation.  It is a specialised lexicon
> which, as Karl pointed out, is sufficiently “language-like” to warrant
> separate tagging.

I'm not a linguist, but am happy to accept that if it's the consensus here.

> Binomial names currently appear in contexts where (as
> I understand it) language tagging would be appropriate because other,
> similar pieces of content are marked with their own tag.

Yes.

> The previous
> sentence is intentionally vague because I don’t actually know what these
> situations are; this is precisely what you should illustrate with
> concrete examples.  People on this list are receptive to these
> arguments.

Do you mean something like the HTML:

    <p>Jaques shouted <q lang="fr">"J'ai trouvé un nid de
     <i lang="??-???">Turdus merula</i>, ici il est!"</q>,
     so we all started to run.</p>

>         Best,

I'm very grateful for your clear and helpful response. Thank you.