Language for taxonomic names, redux

Arthur Reutenauer arthur.reutenauer at
Tue Feb 28 00:59:05 CET 2017

	Hi Andy,

>>   I think it would be useful if, instead of replying to Michael
>> tit-for-tat,
> He seems to insists that I should reply to his every question, or
> opinion, even when others have already done so (and even sometimes
> twice in the same email), and that he is the (only) person that I have
> to persuade. I've been finding the whole process increasingly
> confusing.

  The process is detailed in BCP 47 (,
specifically section 3.5 of RFC 5646 (BCP 47 consists in two RFC
documents).  As you can see, the focus of that section is rather on
the format of the record to be added in the language subtag registry,
than the path to getting the record approved.  In practice that path is
very informal and entirely based on discussion; Michael has the final
say as Language Subtag Reviewer and he thus gets involved in most if not
all discussions arising from a registration request, but he also listens
to other people’s opinions in order to arrive at a decision, even if the
recent back-and-forth may have a given a different impression.

  In an effort to make progress and avoid getting bogged down in too
many details I won’t reply to your email point by point; but I have read
all your arguments.  Trying to sum them up, they are: taxonomic names
have a special status, characterised by the fact that they’re a
specialised lexicon with mostly Latin endings, that they are never
translated, and that when typeset they’re subject to very specific
typographic conventions, such as the use of italic, and the abbreviation
of the genus name to its initial after the first occurrence. Having a
language tag or subtag would help with speech synthesis, styling in Web
technologies, and would generally be a standard method of marking up the
language of content representing taxonomic names.

  I think that everyone recognises the special status, and the need for
specialised processes, but it’s still not clear how a language tag would
help, as opposed to other techniques.  Examples such as

>     <p>Jaques shouted <q lang="fr">"J'ai trouvé un nid de
>      <i lang="??-???">Turdus merula</i>, ici il est!"</q>,
>      so we all started to run.</p>

are helpful, but I feel that some of us here have difficulties
understanding how the language tag makes any difference, since you have
to provide a styling tag anyway.  And the case of the genus name being
reduced to its initial after the first use is so extremely specialised
that it seems impossible to ascribe that behaviour to a simple language
tag; any implementation that tries to take this into account would seem
distinctly over-engineered.  For an example of why that doesn’t work in
practice, see this page on my former employer’s website:

where a list of seven languages is displayed as “cz, Deutsch, English,
español, français, italiano, Nederlands”.  (And it’s not the first time
they do it.)

  The above doesn’t apply at all to the other examples you give (speech
synthesis etc.), that are in fact much more important, but it’s hard to
see how they are going to work in practice (I won’t repeat the
objections); in a way, your use cases are thus either much too generic
or much too specific (pun intended).

  So much for the summary.  I hope you won’t feel it’s biased.  To reply
to your question about use cases of existing tags such as [en] or
[en-GB], I’ll say for my part – since unfortunately no one else has
given any answer – that the resources they apply to are usually
substantially larger than what you envisage.  It can range from a few
sentences to a 1000-page book, but applying a language tag to strings of
two or three characters, while possible, would be quite rare.  Since
however it is the only possible use for a language tag for taxonomic
names, this may have been seen as an obstacle in the current discussion.

  I’ll add two things I’ve been thinking of.  First, Gregor Hagedorn, in
his email from 2008:

seems to say that tags such as “la” had already been used at that point
to tag taxonomic names.  It would be really helpful to see examples of
such uses.

  Second, spell checkers have been mentioned in connection to the fact
that they would need to ignore taxonomic names.  I wonder if it might
not be more constructive to consider spell checkers *for* taxonomic
names, i. e., instead of ignoring them they would actually check them
against a dictionary.  This seems to me a very compelling reason in
favour of a language tag, and I’m surprised that nobody raised it so far
(except at some point Kent who mentioned something like that in

  I hope all of this is helpful.  To put the problem in another light,
consider the sentence “Around the castle there are a lot of Bulbasaurs”
that I heard over the weekend, and replace the last word to make “Around
the castle there are a lot of Erithacus”.  What difference in treatment
is expected from the fact that the latter name denotes a bird genus, and
the former a Pokémon?



More information about the Ietf-languages mailing list