draft-05: editorial comments (2)

Fri Aug 27 18:44:50 CEST 2004

Dear Peter,

Thanks for the comments. Interlinear responses below. I've removed larged
blocks of quotes as appropriate.

Mark is on vacation, so a new draft-06 won't be published until he returns.
I have made the changes indicated in my editor's copy. I will post it later
in the day to http://www.inter-locale.com/ (click on "Editor's Copy")

Addison

Addison P. Phillips
Director, Globalization Architecture
webMethods | Delivering Global Business Visibility
http://www.webMethods.com
Chair, W3C Internationalization (I18N) Working Group
Chair, W3C-I18N-WG, Web Services Task Force
http://www.w3.org/International

Internationalization is an architecture.
It is not a feature.

> -----Original Message-----
> From: ietf-languages-bounces at alvestrand.no
> [mailto:ietf-languages-bounces at alvestrand.no]On Behalf Of Peter
> Constable
> Sent: Thursday, August 26, 2004 10:11 PM
> To: ietf-languages at alvestrand.no
> Subject: draft-05: editorial comments (2)
>
>
> Some additional comments (all editorial, I think):
>
> Throughout: The reference to ISO 639 is old; the current standard is ISO
> 639-1:2002.

Updated. Thanks.
>
>
> Section 2.1: In the discussion of casing conventions, it might be
> helpful to mention the convention for ISO 15924 as well; that way, the
> three that are most relevant are all mentioned, rather than just two of
> the three.

That's a good idea. I added it.
>
>
> Section 2.2: "Note that registered subtags can only appear in specific
> positions in a tag." The term "registered subtag" isn't defined anywhere
> that I have noticed, and while I know what is in mind in this particular
> case, there is some ambiguity between a subtag that's in the registry
> because it was requested via the registration process versus any subtag
> listed in the registry, period. This ambiguity was introduced by the
> decision to list all possible subtags in the registry; when making that
> change, I wonder if you also looked through the doc, esp. sections 2 and
> 3, to check for potential ambiguity.

Registered subtags is now a historical concept, of course: all of the
subtags are registered under 3066bis. I'll reword this to make the origins
aspect clear.
>
>
> Section 2.2, "All 2-character subtags in the IANA registry were
> defined...": why "were"? They still *are* defined in the source
> standard, and the semantics *are* defined in the source standard. Same
> comment applies to the following bullet points. Even for the 4th bullet,
> regarding subtags of 4 - 8 chars, these *are* defined in the registry,
> not *were*. It's especially problematic to use the past tense in
> conjunction with the non-past of "subsequently made" later on. (That
> wording in RFC 3066 -- speaking as its contributor -- was specifically
> intended to be open-ended into the future, not just from time of
> publication of ISO 639 to time of publication of RFC 3066.)

"Were" is accurate, since it (perpetually) indicates origin of the subtag.
It doesn't make any real difference though and I'll change it.
>
>
>
> Section 2.2: In the same set of bullets,
>
> "The single character subtag "x" as the primary subtag indicates that
> the whole language tag consists of private-use subtags."
>
> I would have worded it as follows:
>
> "The single character subtag "x" as the primary subtag indicates that
> the whole language tag is to be treated as privately defined."
>
> For instance, a tag of 'x-afnoric-FR' happens to include a subtag (FR)
> that isn't privately defined; the point of the 'x-' is simply that
> parsers shouldn't assume *anything* about any of the content. As it is
> now, the wording can give the impression that 'x-qaa-QM-Qaaa' would be
> valid, but that 'x-afnoric', 'x-qaa-FR' etc would not be valid.

Yes, I see your point. However "privately defined" doesn't convey it
completely either, I think. How about:

 "The single character subtag "x" as the primary subtag indicates that
 the language tag consists solely of subtags whose meaning is defined by
 private agreement. See [#privateuse]."
>
>
>
> Section 2.2:
>
> "At present all languages that have both kinds of 3-character code also
> are assigned a 2-character code and hopefully future assignments of this
> nature will not arise."
>
> I think the "hopefully" clause can be worded more strongly:
>
> "At present all languages that have both kinds of 3-character code also
> are assigned a 2-character code; it is not expected that future
> assignments of this nature will arise."

This is text from RFC3066 that we previously toned down at your request :-).
I'll make this change.
>
>
>
> Section 2.2:
>
> <quote>
> Note: In order to avoid versioning difficulties in applications such as
//... a whole bunch deleted
> or her data invalidated by eventual addition of a 2-character code for
> that language."
> </quote>
>
> The comment about versioning difficulties experienced in RFC 1766 isn't
> accurate: RFC 1766 did not support alpha-3 IDs from ISO 639-2, so the
> issue being referred to could not have come up. The issue only arose
> with the advent of RFC 3066.

This is the exact text from RFC3066, except that I changed Harald's "as that
of RFC 1766" to "such as those experienced in". It doesn't matter, since it
is now merely a historical curiosity. I'll reword it.

>
> I've suggested before that a better way to deal with this is simply to
> limit the alpha-2 IDs to a fixed set, and since the reference to ISO
> 639-1 has now been made indirect (valid subtags are taken from the
> registry, and the content of the registry is controlled), this is now
> possible, and very simple. For instance, if the ISO 639/RA-JAC suddenly
> changed their mind and added 'ha' for Hawaiian (for example), then the
> maintainer of this registry could simply choose not to add 'ha' to the
> registry.

Indeed, except that there might (although it appears to be exceedingly
unlikely) be a language given an alpha-2 from the get-go. No reason to
forbid that. The ambiguity/stability rules in section 3 handle all of this.

>
> If a reference to JAC policy is going to be quoted, it should be quoted
> from a published source with the referecne included. I would quote the
> following statement from this page
> http://www.loc.gov/standards/iso639-2/iso639jac_n3r.html.

Cool. Thanks.

>
> Section 2.2: 'One of the grandfathered IANA registrations is
> "i-enochian"...' Wouldn't the same apply to "i-mingo"? The statement
> regarding enochian on its own struck me as a bit odd as it simply left
> me wondering why this one item was being singled out.

It's an example. I added the example style to it.
>
>
>
> Section 2.2: 'Registration of extended language subtags and non-standard
> use MUST NOT be permitted.' Surely the appropriate wording here is
> '...is not permitted.' Using "MUST NOT" suggests that the choice
> regarding what kinds of registrations are permitted is potentially up to
> users of the spec.

IIRC, we wanted to use normative language here. How about:

  o Extended language subtags will not be registered except by revision of
this document.
  o Extended language subtags MUST NOT be used to form language tags except
by revision of this document.
>
>
>
> Section 2.2: "Example: In a future revision or update of this document,
> the tag 'zh-min-nan'..." Insofar as we have realized 'min' is assigned
> to an Indonesian language, this particular combination would be
> semantically anomalous if recast as a non-grandfathered tag. Perhaps
> 'zh-gan' would be a better example.

Done.
>
> BTW, I just noticed that example tags are enclosed sometimes in single
> quotation marks but sometimes in double quotation marks. A consistent
> convention should be used throughout.

Yep. Actually, rather late in the game I adopted "zh-Hant-CN" (tag) but 'zh'
(subtag), but haven't gone through and fixed up quoting. I'll do that.
>
>
>
> Section 2.2: 'ISO 15924[2]--"Codes for the representation of the names
> of scripts": alpha-4 script codes' -- the punctuation here is weird: the
> colon looks like it should be inside the quotation marks. Suggested
> revision:
>
> '"Codes for the representation of the names of scripts" (alpha-4 script
> codes)'

Alas we don't make these names up... I'll double check the names.
>
>
>
> Section 2.2: "All 4-character subtags were defined according to ISO
> 15924..." Same issue as above: use "are" rather than "were".

Done. See above.
>
>
> Section 2.2: 'All 2-character subtags following the primary subtag were
> defined...' Same were > are issue. And again in subsequent bullet
> points.

See above.
>
>
> Section 2.2: 'ISO 3166[4]--"Codes for the representation of names of
> countries and their subdivisions - Part 1: Country codes"--alpha-2
> country codes or assignments...' -- change to:
>
> 'ISO 3166[4], "Codes for the representation of names of countries and
> their subdivisions - Part 1: Country codes" (alpha-2 country codes), or
> assignments...'

I'll double check it.
>
>
>
> Section 2.2 "...for Statistical Use[5] or assignments made thereto by
> the governing standards body." Here the term used is "standards";
> elsewhere it is "standardization". Please harmonize.

Done.
>
>
> Section 2.2: "'de-Latn-CH' represents German written using Latin script
> for Switzerland" (and subsequent examples): The wording "...for [country
> X]" is both odd and not particularly meaningful. I think better would be
> "...as used in [country X]". In addition, the first example could
> further elaborate by adding, "(for example, using Switzerland-specific
> vocabulary or spelling.)"

Good point. I'll change 'for' to 'as used in'. The example text I'll add,
but not parenthetically. We shouldn't give the impression that the use of a
region subtag implies anything about the language (except its association
with a region).
>
>
>
> Section 2.2: "(Note: another way of saying this is that all subtags
> following the singleton and before another singleton are part of the
> extension. Thus in the tag "fr-a-Latn", the subtag 'Latn' does not
> represent the ISO 15924 script code for Latin script.)"
>
> First, I'd move the closing parenthesis to "...part of the extension.)
> Thus, in the tag..."

Hmm... the parenthetical note could just be promoted to a bullet item:

  o All subtags following the singleton and before another singleton
    are part of the extension. Example: In the tag "fr-a-Latn", the subtag
    'Latn' does not represent the ISO 15924 script code for Latin script.
>
> Secondly, 'Latn' *could* represent the 15924 code for Latin if the
> document specifying the "a" singleton so specified. So, I'd add the
> following wording at the end: "...the subtag "Latn" does not represent
> the ISO 15924 script code for Latin script unless that usage and its
> interpretation in relation to the overall tag is specified by the
> document that specifies the use of the singleton subtag "a".'

Okay, building on the above, how about:

     Example: In the tag "fr-a-Latn", the subtag
    'Latn' does not represent the script subtag 'Latn' defined
     in the IANA Language Subtag Registry. Its meaning is defined
     by the extension 'a'.

>
> BTW, don't you need to limit possible singletons to exclude "y" or "z"?

Why? Alpha order except 'x' sorts after 'z'. For clarity we should probably
add a rule that says the private use sequence introduced by 'x' is at the
end (it's normatively defined in 2.2 #sources).
>
>
> Section 2.2: 'Use or standardization of the private use subtags is by
> private agreement...' There's an oxymoron here. Delete "or
> standardization".

Done.
>
>
>
> Section 2.4.3: 'None of the subtags in the language tag has a canonical
> mapping...' So far, "canonical mapping" has not been defined. It is
> mentioned in 3.1, but nowhere is it explained what the semantic
> relationship is, or that the canonical mapping can be used to derive a
> semantically-equivalent canonical tag from a tag containing a subtag
> with a canonical mapping. This needs to be explained in 2.4.3.

I added a forward reference in 2.4.3 and explained it thus:

   2. None of the subtags in the language tag has a canonical_value mapping
      in the IANA registry (see [#ianaformat]). Subtags with a
canonical_value
      mapping must be replaced with their mapping in order to canonicalize
the tag.

I'll also add a canonicalization example.
>
>
>
> Section 2.5: I'm satisfied with this discussion of private use
> (sub)tags.

Cool.
>
>
>
> That's my comments for section 2.
>
>
> Peter Constable
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages