Language for taxonomic names, redux

Andrew Cunningham
Wed Mar 1 02:24:11 CET 2017

Regarding, accessibility considerations:

Its more of a question for WAI

it would seem accessibility would benefit if:

1) the tag exclusively identified what taxonomic system was used, and
2) the tag would be sufgicient for software to know what pronunciation
rules there are

la-taxon could be too vague to have any accessibility benefit.

It globally more than one pronunciation exists for such taxonomic terms and
if other taxonomies need to be supported. It would seem more benefical to
do the hard yakka and propose an extension mechanism that could idemtify
which taxonomy system is being used. And what pronunciation system should
be used.

But I am not sure such a dystem is needed. You would need to consult
accessibility experts.

On Tuesday, 28 February 2017, Luc Pardon wrote:
> On 24-02-17 16:03, Michael Everson wrote:
>> I would also like other members of this list to be explicit about their
support. misgivings, or disapproval of the scheme. No plus-ones, and if
you’re fence-sitting, say that explicitly too. Thanks.
> I had been sitting on the fence in scanning mode, for lack of time to
> participate, and am coming down - temporarily - out of respect for
> Michael and his request.
> I'll do a brain dump and then I'll climb back onto my fence.
> Short version: I do support the registration of this tag, as a subtag of
> Latin.
> As to the form that it should take: as a layperson, I do like -linnaeus,
> because it makes clear (to me) what this is about, but I am a little bit
> concerned that specialists may interpret it as referring only to the
> original classification itself, and that would kind of invite them to
> request other subtags for more recent taxonomies ("-cladistic" comes to
> mind).
> See:
> The proposed -taxon would not have that problem. So either go with that,
> or pick -linnaeus and make clear in the description that it is used as a
> generic name.
> In what follows, I'll use -taxon, but that should seen as "shorthand
> notation" only.
> Longer version:
> 1. The number of pages where this would be used ought not to be of
> concern _by itself_. We ought to encourage proper tagging, not
> discourage it. So if somebody comes here with a request, we must not
> turn him away empty-handed.
> There are really only two options: either we point him to an existing
> tag that fits the documents he wants to tag, or, if there is nothing
> suitable in the registry yet, we must add it.
> That is for head tags. In the case of embedded tags - which is what is
> on the table now - there is a third option, i.e. that we tell him that
> there is no need to tag separately, but that would apply only if the
> words he wants to tag are actually part of the main language. I'll come
> back to that, but it should be clear already that Latin words are not
> English words.
> I am not advocating that we should run this list as a kind of
> "self-service tag factory". We may (and must) be critical, we may (and
> must) ask questions, and we don't really want to register "vanity tags".
> Even so, if there actually are documents in the "vanity language" out
> there, we should not dismiss the request out of hand.
> 2. Every time a request comes in that is not obviously justified,
> somebody brings up the "private tag" sledgehammer to try and chase the
> requester away.
> Private is private, period. BCP47 is quite clear about the situations
> where they can be used. In my translation, private tags are really only
> justified on intranets, not on publicly accessible documents.
> In any case, as Michael pointed out, a private tag is certainly not
> appropriate for taxonomic names.
> 3. I can understand Michael's concern that we may be about to register
> something on behalf of taxonomists that will not be used by them anyway,
> but I do not share it.
> First, see nr. 1 above: even if there is only one of them who wants to
> apply proper tagging (and there is), we ought to encourage him.
> Second: this tagging business is a matter of recommendations, and the
> fact that not everybody takes the recommendations to heart should not
> prevent us from registering things that we think are valid.
> I mean, there are truckloads of English web pages out there that have no
> (or no proper) head tag, and there are boatloads of non-English web
> pages that have lang=en as their head tag because the author didn't
> bother to change the default setting of the tool he used to produce the
> html. Should we then declare "en" as obsolete because of that? I hope not.
> As soon as we register la-taxon, we are actually _recommending_ that
> taxons be tagged with it (remember what BCP stands for), and people
> ought to comply, but we can't force them to. Nor can the original
> requester, and we must not blame him for that - all the more since he
> cannot even start to try and convince people to use a tag that isn't
> there yet.
> 4. In the context of Wikipedia guidelines or the lack thereof: the IANA
> repository _is_ the guideline for proper tagging. There is no need for
> Wikipedia guidelines on top of it. Quite to the contrary, WP ought to
> follow suit (if they care about proper tagging, that is).
> This being said, there actually _is_ a WP guideline that _does_ require
> taxons to be tagged separately.
> The {{lang}} template has come up on the list, but not (unless I've
> missed if from my fence) the rationale behind it, and that is
> accessibility, something that WP cares a lot about.
> Their "recommendations to contributors" on
> this topic are extensive, but they have a summary page with
> "Accessibility dos and don'ts". On the English WP, it's at:
>   Note that one of the "do's" says "Do encase non-English words or
> phrases in {{lang}}."
>    So there you are: since we agree that "Homo Sapiens" is not an
> English word, but a Latin one, it follows that it ought to be encased
> whenever it is used in the English WP, as per WP's own guidelines.
>    The same applies, mutatis mutandis, to all WP's in other languages.
> 5. That might seem to fly in the face of another WP guideline, this time
> in the styling section.
> See the section about how to format scientific names:
> That section starts by saying that "Scientific names of organisms are
> formatted according to normal taxonomic nomenclature".
> The catch is that, at the bottom of the section, it says that "Although
> often derived from Latin or Ancient Greek, scientific names are never
> marked up with {{lang}} or related templates".
> In my opinion, this guideline is overly (and exclusively) focused on
> styling, and overlooks the advantages of the {{lang}} template - or was
> written before the introduction of it.
> In any case, there is definitely an internal conflict here (between the
> Accessibility and Formatting WP guidelines), but that is WP's business,
> not ours. The proper place to resolve this is the talk page associated
> with the Manual of Style guideline, not this list.
> If anybody wants to raise the issue over there, he'll probably be
> pointed to the phrasing "_derived from_ Latin", meaning that they don't
> think it _is_ Latin, so it is "not non-English", so it must not be
> encased, so there is no conflict. However, we just determined - or are
> about to officially determine - that these words are actually Latin and
> not English. That may help solve the issue over at WP.
> 6. Furthermore, the WP accessibility guidelines are simply their
> version/implementation of the Web Content Accessibility Guidelines, and
> WCAG does require that language changes inside a document be marked up.
> I have beaten that poor horse before, and I know that some people on
> this list are - to put it mildly - not convinced that this is needed,
> i.e. not convinced that web content must be accessible.
> To them, I'd recommend that they get hold of some screen reader
> software, switch off their screen and/or blindfold themselves, and then
> try to navigate the web just with the screen reader. Next, when they
> manage to land on an English page (assuming their native language is
> English), switch the screen reader software to French and try to
> understand what it's saying. Trust me, it will be an eye-opening
> experience - if I may use that wording. At least it was for me.
> In the meantime, it should suffice to point out :
> a) that some people actually _do_ apply fine-grained tagging, and we
> ought to give them the tags to do so, even if we ourselves should think
> it's ethical to discriminate against blind people, and,
> b) even if we should subscribe to the (wrong) idea that only government
> websites have a legal obligation to be accessible to screen reader
> software, that still makes a lot of web pages that have a real need to
> tag any taxons crawling around on them.
> The conclusion is the same as above: since taxons are actually Latin,
> they must be separately tagged (or at least taggable) if embedded in any
> page that is not in the Taxon language - that is, any page in any
> 7. That brings me to the topic of TTS engines and the "proper"
> pronunciation of saxons.
> My view is that we should leave that as an exercise to the developer of
> the TTS software, or, more precisely, to the developer of the language
> module(s) concerned.
> It may or may not be so that "Homo Sapiens" will be pronounced
> differently in different languages, and at the same time it is quite
> possible that scientists will pronounce it in a different way from
> laypersons, or yet again pronounce it differently when talking with an
> foreign-language colleague as with a same-language one.
> Whatever it is, it is the TTS developer who must find out, not us. Our
> job is to manufacture tags, not TTS software.
> To elaborate: a typical TTS developer of any given language module
> already has to take care of male and female voices, regional accents
> etc. So it can be considered part of his job to figure out how taxons
> are pronounced in the language he's adding support for. He could, for
> example, provide a "male Texan botanist" voice, a "female Cockney
> fishmonger" voice, and what have you.
> The user can then pick whatever is preferred.
> But if there is no tag for taxons, the TTS developer cannot even start
> to add support for them. That means the taxons would be spoken as if
> they were English, or French, or ...
> That may be just fine, but that doesn't mean his job is done. He still
> has to expand "H. Sapiens", just like he has to expand e.g. "e.g." so
> that it is spoken as "for example" instead of "eee dot gee dot".
> So he would _still_ need our tag.
> 8. The argument that "nobody would provide TTS support for taxons" is an
> issue that has to be settled in the boardroom of commercial software
> vendors, as it is typically a cost-benefit analysis. If botanists were
> to have abundant funding and millions to spend on screen reader
> software, the support would be there in no time.
> Situations may change over time as well. Not all that long ago, a
> certain Bill Gates famously remarked that nobody would ever need more
> than 640k of main memory in his computer. So if there is no need for
> (i.e. no benefit to be made from) TTS taxon support today, or if the
> cost is too high today, that may change tomorrow.
> Also, don't underestimate the open source community. They have no
> cost-benefit concerns, and all it takes to build la-saxon support into a
> TTS engine is a dedicated taxonomist with basic programming skills and
> the motivation to do it (say because he has a blind colleague, or is
> blind himself).
> In fact, if it is indeed true that botanists all over the world
> pronounce taxons in the same way, the taxon support for one language
> could easily be rolled out to other language modules, at near zero cost.
> That would also change the cost/benefit analysis for commercial software
> developers.
> In the meantime, the argument that "nobody would support it" - again -
> does not count as a "significant objection" as per BCP 47.
> And again again, if there is no tag for taxons, nobody _can_ support it.
> 9. Tags like la-EN-saxon are not necessary, because the TTS engine can
> get that information from the head tag of the containing document.
> At least I see no reason why la-DE-saxon would appear anywhere but in a
> German document. That is, why would anybody want to force the TTS engine
> to pronounce the taxon in the "German way" when it's reading a French or
> English publication to him?
> Note that a screen reader cannot possibly work on a word-by-word basis
> anyway. When reading out the text, it must deal with the same cues that
> a sighted reader would use: question and exclamation marks, comma's,
> full stops and other punctuation, plus emphasis (ideally indicated with
> the <em> tag and styled with css as italic or bold) and a number of
> other things.
> If it can do that - and it must - it should be a piece of cake to
> "remember" the head tag when stumbling on an embedded la-taxon tag
> further down.
> Anyway, as has been pointed out, if e.g. la-DE-saxon in a French
> document is actually desired, it can be done, just with the "-saxon"
> subtag and its "la" prefix being in the registry.
> And technically, the TTS engine could support that if needed, simply by
> temporarily switching from French (from the head tag) to German (from
> the la-DE-saxon tag) before picking up the appropriate taxon phonemes.
> 10. As to the "slippery slope" issue, I do not share that concern.
> It is true that there are many other specialized vocabularies, but I'd
> say there are not that many that would warrant a _language_ subtag.
> For example:
>    * musical terms like "pianissimo" and "adagio" can be tagged as plain
> Italian, no need to register a new tag.
>    * medical terms are usually translatable (e.g bronchitis), so can be
> considered part of the language they are embedded in, even when derived
> from Ancient Greek. No need to register a new variant subtag for grc.
>    * many other cases, that fall in between (such as proper names), are
> a candidate for semantic tagging, not for language tagging.
> And just to be clear, by "semantic tagging" I mean marking up things as
> "person", "city", "money" etc., as outlined in my next message.
> To summarize my main point there: from my (non-linguist's) point of
> view, a proper name (i.e. a person's name) may not be in the dictionary,
> but it is nonetheless part of that person's native language itself, if
> only because it is pronounced using the same "sound system".
> As a matter of fact, the WCAG guidelines (about fine-grained tagging of
> language changes) state that proper names _need_ not be tagged separately.
> So it would be perfectly ok for us to refuse registration of a subtag
> for proper names, or other word sets that can be considered part of the
> language they are embedded in.
> 11. I am also somewhat surprised by the debate "language" versus "not a
> language", i.e. grammar, verbs, what have you.
> That may be a requirement when it comes to languages proper, but that is
> the field of ISO (639 etc.).
> What we're dealing in, here on this list, is tags to facilitate the
> automated processing of languages. The "introduction" section of BCP47
> leaves no doubt about that.
> So the question is not "is Taxon a language?", but rather: "do taxons
> need special processing", i.e. different from the processing of the text
> they are embedded in?
> I don't think the answer can be anything but "yes".
> It is true that many of the requirements that have been provided as
> justification for the subtag can be addressed - separately - by
> different means, but not in an optimal way:
>  * styling can indeed be handled by css, but if it has to be applied via
> a "span class" attribute, that will be on a document-by-document basis,
> there is no way to standardize the names of the classes. Therefore it
> would be better if the styling could be applied with the :lang(la-taxon)
> selector - even if styling were the only requirement.
>  * the need to prevent translation can - maybe - be addressed in other
> ways, but that means that, without the la-taxon tag, we'd have to add a
> second set of tags in addition to the styling.
>  * to repeat (and elaborate on) what I said above: even if taxons were
> indeed pronounced in the same way as native words (e.g. English when
> embedded in an English-language document), a TTS engine must still be
> able tell a taxon apart from a hole in the ground, if only to expand "H.
> Sapiens", not to mention the examples that John provided, such as "Rubus
> ursinus Cham. et Schldl." meaning that the name was jointly published by
> von Chamisso and von Schlechtendal.
>   So yes, the tag being requested is definitely about "facilitating the
> processing of languages" and it does fit perfectly within our charter.
> 12. As a matter of fact, I was not sure I'd support this request until I
> saw Michael and others state that taxons are "Latin and nothing else".
> If they had been similar to proper names or medical terms, i.e. part of
> the surrounding language (as in: a specialized subset of English words),
> I'd have preferred semantic tagging, e.g. <taxon>.
> Of course, that would - as far as I know - require an extension of the
> current semantic tagging, which (in HTML5) only concerns itself with
> document structure (<article>, ...) or (in ARIA) widgets for user
> interaction, and that may not be a trivial undertaking, but that does
> not mean we'd have to jump in.
> But now that we're sure it's a variant of Latin, it is our cup of tea
> and we have to drink it.
> 13. I hope that Michael by now doesn't regret that he has invited people
> to come down. As for me, I'll just send one more message and then I'll
> be back on the fence.
> Luc
Andrew Cunningham
