-08 comments

Mon Dec 6 14:30:07 CET 2004

Hi,

1) It is not always clear to me which statements are normative and which are
informative. It might be good to indicate the normative text. Capitalized
keywords alone does not seem like an adequate guide.
BTW, in section 2.3 bullet 3, Since this text is an example: "SHOULD use 'he'
for Hebrew", it doesn't seem right to me that the should is capitalized.

2) In section 2.2.2 possibilities for registration,
The spec refers to "very compelling evidence of need" to register primary
language tags rejected by ISO 639.
Do you have any examples or guidance as to what is considered legitimate need
or compelling evidence?

3) section 2.3 choice of language tag

Bullet one seems difficult to comply with. How does anyone know when more or
less specificity is justified?
First of all, it requires knowing all of the scripts and regions that a
language might be invoked. For a few of the most widely known examples, the
answer might be known or easily guessed, but for many people and languages it
is hard to know. So the degree of specificity can only be guessed.
The example of German and e-mail seems to suggest that the application's
requirement also needs to be taken into account.
I have no idea why it should be ok to label mail as de, instead of the more
specific choices. The application doesn't know  if a voice reader or other
device or application will ultimately process the email and benefit from more
specificity.

Frankly, I don't know why we wouldn't suggest tagging as specific as possible.
It does no harm to label content correctly and accurately.

I have the same problem with the second bullet, since it would only be a guess
that en-US is nearly all Latin. If their were communities transliterating to
another script, only experts would know that this was so.

It is also an approach that is unstable. If another script comes on the scene,
then all the original labels become ambiguous. We have seen many languages
change scripts for political reasons or to improve literacy.
Where is the harm in being as specific as possible?

As an aside, the para in front of the bullets stipulates we MUST specify how
the procedures vary, but the bulleted items are either SHOULDs or subjective,
so it seems pointless to make this a MUST.

4) section 2.4.1 language range
How does one specify a range for content of undetermined language?

For that matter, when is it ok to return content with an undetermined language?
Usually, if a particular language is not requested, some default language is
used.
The fallbacks usually would not return an undetermined.
And it cannot be specified by a language range.

5) section 2.2 says that the language tag cannot be empty.
We should clarify that the null string is allowed, but the language subtag
cannot be empty if there are other subtags.
The BNF should show the empty string is valid. (I presume this suggestion is
correct since elsewhere we recommend the empty string as an alternative to
UND.)

6) section 2.4.2 matching language tags
I have no idea what the last sentence of bullet one means-
"Any implementation that uses this technique should ensure the appropriate data
is available on each level."

7) In the same section, the introduction to the bullets indicates what follows
is a description of common implementations, which to me suggests it is
informative, and bullets 1-3 seem informative.
But then the remaining bullets seem normative. I suggest clarifying the
normative part with a sentence of introduction preceding those bullets.

Further, without a clear matching algorithm, we lose interoperability. I would
like a more prescriptive recommendation.

8) I am puzzled by the recommendations allowing extension and private use tags
to be ignored. I guess in the context of unknown matching algorithms it is
necessary. However, if the "common fallback" approach is in use, it makes sense
to simply extend the approach to these tags. If there is no match, it falls
back to the same version as ignoring them. On the other hand it allows a match
to occur if it can.

Having software ignore these tags, potentially creates situations where there
are two variations of content (with and without the extension) and it is
arbitrary as to which is returned, since the extension is ignored.

9) I am curious about the suggestion to use UND as a subtag wildcard. Why not
allow "*" to be used in subtags for that purpose? It seems like a natural
extension of its use to match any tag.

10) In the last bullet/para of section 3.2- with respect to the exception to
stability provisions.
When the tag is registered its meaning is documented. I would be concerned that
if all of the subtags become valid, that the meaning of the generated tag might
be different from the grandfathered tag.
So marking the latter as redundant could be problematic. Is this possible?

11) Section 6 changes from 3066
In the section on compatibility, although the remark with respect to content is
true, it might be worth noting that existing language range queries may now
return tags with the new format to software that is expecting 3066 tags.

Also, software that used a 3066 language range and always returned the same
content, may now find that the content is different and possibly varies on each
request. (Where there is now content for some language and region,
distinguished by the new tags.)

I have some editorial suggestions which I can send you later and offline.
tex