Document: draft-newman-i18n-comparator-06.txt Reviewer: Spencer Dawkins [spencer@mcsr-labs.org] Review Date: Monday 2/27/2006 10:46 PM CST IETF LC Date: 03 March 2006 Summary: This document is almost ready for publication as a Proposed Standard. I have a small number of nittish comments (more than editorial), but if the authors agree, I believe any of these changes could be RFC Editor notes. The ones I'd really like to see Brian look closely at are in 3.2, 4.2.1, and 4.2.2. Review Comments: ---------------- 3.2. Wildcards Spencer: two minor concerns with the following text: (1) I'm not sure how the first two sentences work together. Does the first sentence say "there can only be one wildcard character in the string a client uses to select a collation", or does "a wildcard" mean something besides "one wildcard"? The second sentence is my greater confusion, because I'm reading the first sentence as saying that "aa*aa*" would NOT be OK, because it has more than one wildcard character, and reading the second sentence as saying that "aa**aa" would NOT be OK, because it has adjacent wildcard characters, but it's NOT OK anyway, because it has more than one wildcard character (whether adjacent or not). Please clue me in. (2) I would love to see a sentence explaining why the third sentence is "SHOULD NOT use wildcards" and not "MUST NOT use wildcards". To be honest, I'm trying to understand why this restriction exists at all (at either SHOULD NOT or MUST NOT strength), but the absence of SHOULD NOT qualification doesn't help me with this, and I expect that it would help. And why is "the server SHOULD select the collation" a SHOULD, and not a MUST? Mumble. The string a client uses to select a collation MAY contain a wildcard ("*") character which matches zero or more collation-chars. Wildcard characters MUST NOT be adjacent. Clients which support disconnected operation SHOULD NOT use wildcards to select a collation, but clients which provide collation operations only when connected to the server MAY use wildcards. If the wildcard string matches multiple collations, the server SHOULD select the collation with the broadest scope (preferably international scope), the most recent table versions and the greatest number of supported operations. 3.3. Ordering Direction Spencer: this is at the edge of a nit, but "collation-order" and "collation-sel" haven't been introduced previously, and I'm having to guess that "sel" is short for "selection", or something. Mumble. When used as a protocol element for ordering, the collation name MAY be prefixed by either "+" or "-" to explicitly specify an ordering direction. As mentioned previously, "+" has no effect on the ordering function, while "-" negates the result of the ordering function. In general, collation-order is used when a client requests a collation, and collation-sel is used when the server informs the client of the selected collation. 4.2.1. Equality Spencer: I'm confused here (note the trend :-). Is the following text saying, "MAY return either "error" or "no-match" if the input strings are not valid character strings ..."? The current text doesn't seem to say what happens when the input strings aren't valid and the equality function doesn't return "error", which is only a MAY strength ("so don't be surprised when your server does this"). The equality function always returns "match" or "no-match" when supplied valid input, and MAY return "error" if the input strings are not valid character strings or violate other collation constraints. 4.2.2. Substring Spencer: the following text requiring the ending offset seems inconsistent with 5.2, which (as I understand it) allows either the ending offset OR the length to be returned. If they ARE inconsistent, I'd much rather see 4.2.2 prevail, because I don't feel good about telling application developers that sometimes they may get (10, 15) that means "six characters/octets long" and other times they may get (10, 15) which means "15 characters/octets long". Application protocols MAY return position information for substring matches. If this is done, the position information SHOULD include both the starting offset and the ending offset in the string. 4.3. Internal Canonicalization Algorithm Spencer: I don't believe that "The output of the canonicalization algorithm MAY have no meaning to a human" is an upper-case MAY - not a requirement. A collation specification MUST describe the internal canonicalization algorithm. This algorithm can be applied to individual strings and the result strings can be stored to potentially optimize future comparison operations. A collation MAY specify that the canonicalization algorithm is the identity function. The output of the canonicalization algorithm MAY have no meaning to a human. 7.1. Collation Registration Procedure Spencer: I'm not trying to change existing practice, but the IESG is having enough fun reviewing appeals these days that if the appeal track started with the APPS area directors, I'm sure that the other ADs would be thrilled. :-( The IETF will create a mailing list, collation@ietf.org, which can be used for public discussion of collation proposals prior to registration. Use of the mailing list is encouraged but not required. The actual registration procedure will not begin until the completed registration template is sent to iana@iana.org. The IESG will appoint a designated expert who will monitor the collation@ietf.org mailing list and review registrations forwarded from IANA. The designated expert is expected to tell IANA and the submitter of the registration within two weeks whether the registration is approved, approved with minor changes, or rejected with cause. When a registration is rejected with cause, it can be re-submitted if the concerns listed in the cause are addressed. Decisions made by the designated expert can be appealed to the IESG and subsequently follow the normal appeals procedure for IESG decisions. 9.2.1. ASCII Casemap Collation Description Spencer: the following text really clarified the text describing ACAP and Sieve previously - use this sentence in that section as well? For historical reasons, in the context of ACAP and Sieve, the name "i;ascii-casemap" is a synonym for this collation. 9.5.1. Octet Collation Description Spencer: Ouch! is there a less ambiguous naming set than "first string" and "second string"? I'm almost sure I've also used programming languages that thought the first string was the search target, so it took me a second to grok that the second string was the search target. If I'm the only one who is confused, that's not a problem. The substring function returns "match" if the first string is the empty string, or if there exists a substring of the second string of length equal to the length of the first string which would result in a "match" result from the equality function. Otherwise the substring function returns "no-match".