draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions

Thu Jan 6 23:44:53 CET 2005

> From: kristin.hubner at Sun.COM [mailto:kristin.hubner at Sun.COM]

> I notice two main types of arguments going on in this thread, where it
seems to me
> that there is frustration
> and "talking past each other" occurring due to fundamentally different
concerns and
> assumptions between different constituencies...

I have feet in both the "implementors" and "linguistic purists" camps,
and so think I understand both. But there are many points on which I
don't agree with your assessment.

> From [the implementors'] point of view, the most important aspect of
> language tags is being able to
> parse and match them -- exact linguistic purity and accuracy is a
secondary issue.

I would say as an implementor that it's important to find appropriate
ways to match tags that meet legitimate needs in realistic scenarios in
the best way we can, and to be aware of behaviour that will be
experienced when using existing implementations, making sure that any
degredation of behaviour is known and accepted to be offset by benefits,
and that there are no really bad behaviours that may result. I would
consider exact linguistic purity secondary since this system is not
intended to document linguistic realities but to provide useful
behaviours related to differences in language usage in information
systems.

> From [the implementors'] point of view, the
> addition of new tags, regardless of whether the new tags improve
language tagging
> "accuracy", may be actively
> harmful unless accompanied by improved matching rules.

Here, I disagree, unless this statement is to be understood in a
hypothetical way -- a priori, it would be possible to make changes that
are harmful, but I do not assume that addition of new tags is
necessarily harmful.

> To the extent that the adding of tags moves beyond
> simple registration of new tags, and instead into new forms of tags
and new rules for
> interpreting tags, that is, that
> the new tags bring up fundamental matching algorithm questions, that
becomes the
> main concern for this group.

There are no new forms of tags proposed! 

The draft would impose *restraints* on the forms that tags can take, and
define precisely what forms tags could take. 

This is a point where there may be some "talking past each other". Some
people are speaking from a position in which it is assumed that the part
of a tag that refers to country can be predicted to be in the
second-subtag position. Those supporting the draft are responding that
RFC 3066 does not assume this; it only implies that the only case in
which a country code can be reliably recognized as such is when it is
the second subtag. The former assume that we should continue to keep
country codes in second position because that's the place we've been
able up to now to recognize it. The latter respond that 

- existing implementations will still be able to recognize it when its
in that place

- RFC 3066 permits it to come in other places, but existing
implementations will never be able to recognize it more than
heuristically

- that the new draft would allow new implementations to *always*
recognize it in any tag, and

- *as implementors* it is thought that requiring that country codes only
ever come in that place is *not* what will provide the best behaviours
for users (specifically in cases where script and country subtags are
both used).

> For [linguistic purists], the most important aspect of language tags
is having
> them be accurate and precise...
> Any matching issue (and in particular issues of trying to fall back to
a more "generic"
> match when an exact match is not available) are secondary.

For the linguist, what matters is the functional behaviour of the
system, including matching, but not the implementation. The linguist,
per se, has no opinion on what the internal structure of tags should
look like; they only specify what the functional requirements of the
overall system should be, and which tradeoffs in functionality are
better or worse.

But maybe I haven't got the same picture of the distinction between the
"implementor" and the "linguistic purist" that you intend. 

> A second type of argument... seems to me to be more
linguistic/political in nature, which is
> what is the "correct" (linguistically correct? politically correct?)
way to name the tags: what sort of naming scheme corresponds to
linguistic reality,

The question of what the relationship between the naming scheme an
ontologies is important inasmuch as knowing the ontology informs us of
what kinds of distinctions need to be made and kinds of relationships
may exist between those kinds of distinctions, and that guides us in
determining functional requirements, which should be the basis of
implementations. (Once again, a pointer to a white paper on these issues
from a few years ago:
http://www.sil.org/silewp/abstract.asp?ref=2002-003.) 

> or what sort of naming scheme is politically acceptable, and is there
a conflict there.
> This does get back to the
> algorithmic matching issue in a sense though, which is that if one
wants some sort of
> hierarchical structure to
> the tags (to allow easier matching),

Insofar as tags are structures as linearly-sequenced elements and that
there are matching algorithms in use that are based on
left-prefix-trunctation, there is no debate over *wanting* a
hierarchical structure: it's a reality we must live with.

> or indeed [wants to] define any sort of matching rules (as an
> implementor wants), you're
> probably getting right into some political questions about how
matching "should work".
> So for those who wanted
> to stick just to linguistic accuracy and try to avoid political
issues, trying to avoid
> discussion of algorithmic matching
> may have seemed appealing (but then provides no help to what I've
termed the
> "implementors").

This seems to assume that those promoting an ordering of script and
country subtags as found in the draft are supporting that order for
reasons of linguistic purity and have no interest in discussion of
algorithmic matching, which is completely wrong: the reason for
supporting that order of subtags has everything to do with matching
behaviour in certain widely-deployed algorithms.

Peter Constable