Request for variant subtag fr 16th-c 17th-c Resubmitted!

Tue Dec 19 03:59:05 CET 2006

I don't wish to get too involved in this discussion, as I have no expertise
on the particular issues with French, but I find the principles interesting
and important.  It is right to proceed with caution.

Peter Constable wrote:
<<The IDs "fr" and "frm" have different semantics. To create an
implementation that assumes otherwise is to break operability, and in
that way defeating the purpose of having a standard for language
identifiers in the first place.>>

ISO 639 stays the same, including the meanings of "fr" and "frm".

As a linear list of languages, ISO 639 may divide up varieties which belong
together from another point of view, and which would be kept together in a
different linear classification.  But there is no proposal to solve this
problem by changing or demoting ISO 639, producing alternatives, or
otherwise defeating its purpose.  What has been proposed is to give some
support to alternative classifications by taking the list of ISO 639 and
using the subtag mechanism, not just to define sub-varieties, but to unify
sub-varieties across prefixes, in a most intuitive way.

<<Since the two IDs have different semantics, that means that
"fr-variantx" and "frm-variantx" have different semantics. If there is a
need to be able to create a query that will retrieve either, then that
is a problem for query / matching implementations. It is completely
inappropriate to shift this putative need into the LST registry by
suggesting that these two IDs have the same semantics.>>

<<So, at the very least, if these two prefix fields are part of the
registration for "1606Nict" then it is necessary to explain what is the
intended semantic distinction between "fr-1606Nict" and "frm-1606Nict".>>

As I understand it, the semantics of "fr-variantx" and "frm-variantx" would
be that the texts so tagged belong with "fr" and "frm" respectively
according to the ISO 639 definition of those prefixes. People who wish to
retrieve fr but not frm, or vice versa, continue to do so.  Further, texts
tagged with "fr-variantx" or with "frm-variantx" are identified as belonging
to a particular area around the borderline between fr and frm, and can be
targeted by a matching expression such as *-variantx (provided that subtag
variantx is confined to the relevant prefixes, as proposed).  Both queries
are legitimate and the tagging scheme should enable them to be supported
(through this mechanism or some other).

In the absence of multiple prefixes for a subtag, the latter retrieval has
to be made by an expression such as "fr-variantx or frm-varianty", where
variantx is defined on fr only, and varianty is defined on frm only, and
variantx and varianty are two different names for what is linguistically the
same thing.  This makes retrieval of the borderline area less convenient,
but still perfectly possible, so it is hard to see what is dangerous about
allowing it to be described more conveniently.

Isn't it likely that there will be more instances of this sort, and that
when users have tired of retrieving areas of overlap through disjunctions
involving differently named subtags which mean the same thing, that there
will be a demand to make the names variantx and varianty the same?

Ciarán Ó Duibhín