draft-phillips-langtags-04 /2.4.2 Matching language tags

Thu Jul 1 15:41:40 CEST 2004

> From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-
> bounces at alvestrand.no] On Behalf Of Tex Texin

> 1) If the match value is zh-TW, and if all of the documents are
labeled
> with
> tags that describe the script used, such as zh-hant-TW I will not get
a
> match
> until the match value is truncated to zh. Then it matches both
zh-hans-CN
> and
> zh-hant-TW, and so perhaps returning a less than optimum document.

This general problem is one I pointed out when we discussing the de-1996
stuff. It is a more serious concern in the example you've just raised.
The Chinese case may not be typical, though: there have been strong
correlations between countries and script variants for Chinese, but I'm
not sure there are other such cases. On the other hand, your suggestion
(treat missing in-between subtags as "don't care") has some sense to it.

> Using the boont example, a search for en-boont would match en-Latn-US-
> boont,
> en-Latn-boont, en-US-boont, and en-boont.

Are all four of those considered valid?

> 2) The other side of the script issue is when the script is specified
in
> the
> match value but not specified in the documents.
> A search for zh-hant-TW will not match documents labeled zh-TW.
> As the match value is truncated to zh-hant there will also be no
match.
> When it is truncated to zh it will consider zh-TW and zh-CN a match.
Even
> though zh-TW might be a better choice, the zh-CN document might be
> returned.

Well, note that the way the language-range works is that you don't
truncate the match value; you only truncate the tags in the repository
metadata: generic results don't conform to a specific request; you try
to return something as specific or possibly more specific. Of course,
after you fail to return something that conforms to the request, then
you may start looking for least-offending options, in which case
truncation of the match value may certainly apply. 

Peter Constable