Language for taxonomic names, redux

Luc Pardon lucp at
Mon Feb 27 21:36:56 CET 2017

On 24-02-17 16:03, Michael Everson wrote:
> I would also like other members of this list to be explicit about their support. misgivings, or disapproval of the scheme. No plus-ones, and if you’re fence-sitting, say that explicitly too. Thanks. 

I had been sitting on the fence in scanning mode, for lack of time to
participate, and am coming down - temporarily - out of respect for
Michael and his request.

I'll do a brain dump and then I'll climb back onto my fence.

Short version: I do support the registration of this tag, as a subtag of

As to the form that it should take: as a layperson, I do like -linnaeus,
because it makes clear (to me) what this is about, but I am a little bit
concerned that specialists may interpret it as referring only to the
original classification itself, and that would kind of invite them to
request other subtags for more recent taxonomies ("-cladistic" comes to


The proposed -taxon would not have that problem. So either go with that,
or pick -linnaeus and make clear in the description that it is used as a
generic name.

In what follows, I'll use -taxon, but that should seen as "shorthand
notation" only.

Longer version:

1. The number of pages where this would be used ought not to be of
concern _by itself_. We ought to encourage proper tagging, not
discourage it. So if somebody comes here with a request, we must not
turn him away empty-handed.

There are really only two options: either we point him to an existing
tag that fits the documents he wants to tag, or, if there is nothing
suitable in the registry yet, we must add it.

That is for head tags. In the case of embedded tags - which is what is
on the table now - there is a third option, i.e. that we tell him that
there is no need to tag separately, but that would apply only if the
words he wants to tag are actually part of the main language. I'll come
back to that, but it should be clear already that Latin words are not
English words.

I am not advocating that we should run this list as a kind of
"self-service tag factory". We may (and must) be critical, we may (and
must) ask questions, and we don't really want to register "vanity tags".
Even so, if there actually are documents in the "vanity language" out
there, we should not dismiss the request out of hand.

2. Every time a request comes in that is not obviously justified,
somebody brings up the "private tag" sledgehammer to try and chase the
requester away.

Private is private, period. BCP47 is quite clear about the situations
where they can be used. In my translation, private tags are really only
justified on intranets, not on publicly accessible documents.

In any case, as Michael pointed out, a private tag is certainly not
appropriate for taxonomic names.

3. I can understand Michael's concern that we may be about to register
something on behalf of taxonomists that will not be used by them anyway,
but I do not share it.

First, see nr. 1 above: even if there is only one of them who wants to
apply proper tagging (and there is), we ought to encourage him.

Second: this tagging business is a matter of recommendations, and the
fact that not everybody takes the recommendations to heart should not
prevent us from registering things that we think are valid.

I mean, there are truckloads of English web pages out there that have no
(or no proper) head tag, and there are boatloads of non-English web
pages that have lang=en as their head tag because the author didn't
bother to change the default setting of the tool he used to produce the
html. Should we then declare "en" as obsolete because of that? I hope not.

As soon as we register la-taxon, we are actually _recommending_ that
taxons be tagged with it (remember what BCP stands for), and people
ought to comply, but we can't force them to. Nor can the original
requester, and we must not blame him for that - all the more since he
cannot even start to try and convince people to use a tag that isn't
there yet.

4. In the context of Wikipedia guidelines or the lack thereof: the IANA
repository _is_ the guideline for proper tagging. There is no need for
Wikipedia guidelines on top of it. Quite to the contrary, WP ought to
follow suit (if they care about proper tagging, that is).

This being said, there actually _is_ a WP guideline that _does_ require
taxons to be tagged separately.

The {{lang}} template has come up on the list, but not (unless I've
missed if from my fence) the rationale behind it, and that is
accessibility, something that WP cares a lot about.

Their "recommendations to contributors" on
this topic are extensive, but they have a summary page with
"Accessibility dos and don'ts". On the English WP, it's at:

  Note that one of the "do's" says "Do encase non-English words or
phrases in {{lang}}."

   So there you are: since we agree that "Homo Sapiens" is not an
English word, but a Latin one, it follows that it ought to be encased
whenever it is used in the English WP, as per WP's own guidelines.

   The same applies, mutatis mutandis, to all WP's in other languages.

5. That might seem to fly in the face of another WP guideline, this time
in the styling section.

See the section about how to format scientific names:

That section starts by saying that "Scientific names of organisms are
formatted according to normal taxonomic nomenclature".

The catch is that, at the bottom of the section, it says that "Although
often derived from Latin or Ancient Greek, scientific names are never
marked up with {{lang}} or related templates".

In my opinion, this guideline is overly (and exclusively) focused on
styling, and overlooks the advantages of the {{lang}} template - or was
written before the introduction of it.

In any case, there is definitely an internal conflict here (between the
Accessibility and Formatting WP guidelines), but that is WP's business,
not ours. The proper place to resolve this is the talk page associated
with the Manual of Style guideline, not this list.

If anybody wants to raise the issue over there, he'll probably be
pointed to the phrasing "_derived from_ Latin", meaning that they don't
think it _is_ Latin, so it is "not non-English", so it must not be
encased, so there is no conflict. However, we just determined - or are
about to officially determine - that these words are actually Latin and
not English. That may help solve the issue over at WP.

6. Furthermore, the WP accessibility guidelines are simply their
version/implementation of the Web Content Accessibility Guidelines, and
WCAG does require that language changes inside a document be marked up.

I have beaten that poor horse before, and I know that some people on
this list are - to put it mildly - not convinced that this is needed,
i.e. not convinced that web content must be accessible.

To them, I'd recommend that they get hold of some screen reader
software, switch off their screen and/or blindfold themselves, and then
try to navigate the web just with the screen reader. Next, when they
manage to land on an English page (assuming their native language is
English), switch the screen reader software to French and try to
understand what it's saying. Trust me, it will be an eye-opening
experience - if I may use that wording. At least it was for me.

In the meantime, it should suffice to point out :

a) that some people actually _do_ apply fine-grained tagging, and we
ought to give them the tags to do so, even if we ourselves should think
it's ethical to discriminate against blind people, and,

b) even if we should subscribe to the (wrong) idea that only government
websites have a legal obligation to be accessible to screen reader
software, that still makes a lot of web pages that have a real need to
tag any taxons crawling around on them.

The conclusion is the same as above: since taxons are actually Latin,
they must be separately tagged (or at least taggable) if embedded in any
page that is not in the Taxon language - that is, any page in any language.

7. That brings me to the topic of TTS engines and the "proper"
pronunciation of saxons.

My view is that we should leave that as an exercise to the developer of
the TTS software, or, more precisely, to the developer of the language
module(s) concerned.

It may or may not be so that "Homo Sapiens" will be pronounced
differently in different languages, and at the same time it is quite
possible that scientists will pronounce it in a different way from
laypersons, or yet again pronounce it differently when talking with an
foreign-language colleague as with a same-language one.

Whatever it is, it is the TTS developer who must find out, not us. Our
job is to manufacture tags, not TTS software.

To elaborate: a typical TTS developer of any given language module
already has to take care of male and female voices, regional accents
etc. So it can be considered part of his job to figure out how taxons
are pronounced in the language he's adding support for. He could, for
example, provide a "male Texan botanist" voice, a "female Cockney
fishmonger" voice, and what have you.

The user can then pick whatever is preferred.

But if there is no tag for taxons, the TTS developer cannot even start
to add support for them. That means the taxons would be spoken as if
they were English, or French, or ...

That may be just fine, but that doesn't mean his job is done. He still
has to expand "H. Sapiens", just like he has to expand e.g. "e.g." so
that it is spoken as "for example" instead of "eee dot gee dot".

So he would _still_ need our tag.

8. The argument that "nobody would provide TTS support for taxons" is an
issue that has to be settled in the boardroom of commercial software
vendors, as it is typically a cost-benefit analysis. If botanists were
to have abundant funding and millions to spend on screen reader
software, the support would be there in no time.

Situations may change over time as well. Not all that long ago, a
certain Bill Gates famously remarked that nobody would ever need more
than 640k of main memory in his computer. So if there is no need for
(i.e. no benefit to be made from) TTS taxon support today, or if the
cost is too high today, that may change tomorrow.

Also, don't underestimate the open source community. They have no
cost-benefit concerns, and all it takes to build la-saxon support into a
TTS engine is a dedicated taxonomist with basic programming skills and
the motivation to do it (say because he has a blind colleague, or is
blind himself).

In fact, if it is indeed true that botanists all over the world
pronounce taxons in the same way, the taxon support for one language
could easily be rolled out to other language modules, at near zero cost.
That would also change the cost/benefit analysis for commercial software

In the meantime, the argument that "nobody would support it" - again -
does not count as a "significant objection" as per BCP 47.

And again again, if there is no tag for taxons, nobody _can_ support it.

9. Tags like la-EN-saxon are not necessary, because the TTS engine can
get that information from the head tag of the containing document.

At least I see no reason why la-DE-saxon would appear anywhere but in a
German document. That is, why would anybody want to force the TTS engine
to pronounce the taxon in the "German way" when it's reading a French or
English publication to him?

Note that a screen reader cannot possibly work on a word-by-word basis
anyway. When reading out the text, it must deal with the same cues that
a sighted reader would use: question and exclamation marks, comma's,
full stops and other punctuation, plus emphasis (ideally indicated with
the <em> tag and styled with css as italic or bold) and a number of
other things.

If it can do that - and it must - it should be a piece of cake to
"remember" the head tag when stumbling on an embedded la-taxon tag
further down.

Anyway, as has been pointed out, if e.g. la-DE-saxon in a French
document is actually desired, it can be done, just with the "-saxon"
subtag and its "la" prefix being in the registry.

And technically, the TTS engine could support that if needed, simply by
temporarily switching from French (from the head tag) to German (from
the la-DE-saxon tag) before picking up the appropriate taxon phonemes.

10. As to the "slippery slope" issue, I do not share that concern.

It is true that there are many other specialized vocabularies, but I'd
say there are not that many that would warrant a _language_ subtag.

For example:

   * musical terms like "pianissimo" and "adagio" can be tagged as plain
Italian, no need to register a new tag.

   * medical terms are usually translatable (e.g bronchitis), so can be
considered part of the language they are embedded in, even when derived
from Ancient Greek. No need to register a new variant subtag for grc.

   * many other cases, that fall in between (such as proper names), are
a candidate for semantic tagging, not for language tagging.

And just to be clear, by "semantic tagging" I mean marking up things as
"person", "city", "money" etc., as outlined in my next message.

To summarize my main point there: from my (non-linguist's) point of
view, a proper name (i.e. a person's name) may not be in the dictionary,
but it is nonetheless part of that person's native language itself, if
only because it is pronounced using the same "sound system".

As a matter of fact, the WCAG guidelines (about fine-grained tagging of
language changes) state that proper names _need_ not be tagged separately.

So it would be perfectly ok for us to refuse registration of a subtag
for proper names, or other word sets that can be considered part of the
language they are embedded in.

11. I am also somewhat surprised by the debate "language" versus "not a
language", i.e. grammar, verbs, what have you.

That may be a requirement when it comes to languages proper, but that is
the field of ISO (639 etc.).

What we're dealing in, here on this list, is tags to facilitate the
automated processing of languages. The "introduction" section of BCP47
leaves no doubt about that.

So the question is not "is Taxon a language?", but rather: "do taxons
need special processing", i.e. different from the processing of the text
they are embedded in?

I don't think the answer can be anything but "yes".

It is true that many of the requirements that have been provided as
justification for the subtag can be addressed - separately - by
different means, but not in an optimal way:

 * styling can indeed be handled by css, but if it has to be applied via
a "span class" attribute, that will be on a document-by-document basis,
there is no way to standardize the names of the classes. Therefore it
would be better if the styling could be applied with the :lang(la-taxon)
selector - even if styling were the only requirement.

 * the need to prevent translation can - maybe - be addressed in other
ways, but that means that, without the la-taxon tag, we'd have to add a
second set of tags in addition to the styling.

 * to repeat (and elaborate on) what I said above: even if taxons were
indeed pronounced in the same way as native words (e.g. English when
embedded in an English-language document), a TTS engine must still be
able tell a taxon apart from a hole in the ground, if only to expand "H.
Sapiens", not to mention the examples that John provided, such as "Rubus
ursinus Cham. et Schldl." meaning that the name was jointly published by
von Chamisso and von Schlechtendal.

  So yes, the tag being requested is definitely about "facilitating the
processing of languages" and it does fit perfectly within our charter.

12. As a matter of fact, I was not sure I'd support this request until I
saw Michael and others state that taxons are "Latin and nothing else".

If they had been similar to proper names or medical terms, i.e. part of
the surrounding language (as in: a specialized subset of English words),
I'd have preferred semantic tagging, e.g. <taxon>.

Of course, that would - as far as I know - require an extension of the
current semantic tagging, which (in HTML5) only concerns itself with
document structure (<article>, ...) or (in ARIA) widgets for user
interaction, and that may not be a trivial undertaking, but that does
not mean we'd have to jump in.

But now that we're sure it's a variant of Latin, it is our cup of tea
and we have to drink it.

13. I hope that Michael by now doesn't regret that he has invited people
to come down. As for me, I'll just send one more message and then I'll
be back on the fence.


More information about the Ietf-languages mailing list