Document:   draft-newman-i18n-comparator-06.txt
Reviewer: Spencer Dawkins [spencer@mcsr-labs.org]
Review Date:  Monday 2/27/2006 10:46 PM CST
IETF LC Date:  03 March 2006

Summary: This document is almost ready for publication as a Proposed
Standard. I have a small number of nittish comments (more than editorial),
but if the authors agree, I believe any of these changes could be RFC Editor
notes. The ones I'd really like to see Brian look closely at are in 3.2,
4.2.1, and 4.2.2.


Review Comments:
----------------
3.2.  Wildcards

Spencer: two minor concerns with the following text:

(1) I'm not sure how the first two sentences work together. Does the first
sentence say "there can only be one wildcard character in the string a
client uses to select a collation", or does "a wildcard" mean something
besides "one wildcard"? The second sentence is my greater confusion, because
I'm reading the first sentence as saying that "aa*aa*" would NOT be OK,
because it has more than one wildcard character, and reading the second
sentence as saying that "aa**aa" would NOT be OK, because it has adjacent
wildcard characters, but it's NOT OK anyway, because it has more than one
wildcard character (whether adjacent or not). Please clue me in.

(2) I would love to see a sentence explaining why the third sentence is
"SHOULD NOT use wildcards" and not "MUST NOT use wildcards". To be honest,
I'm trying to understand why this restriction exists at all (at either
SHOULD NOT or MUST NOT strength), but the absence of SHOULD NOT
qualification doesn't help me with this, and I expect that it would help.
And why is "the server SHOULD select the collation" a SHOULD, and not a
MUST? Mumble.

   The string a client uses to select a collation MAY contain a wildcard
   ("*") character which matches zero or more collation-chars.  Wildcard
   characters MUST NOT be adjacent.  Clients which support disconnected
   operation SHOULD NOT use wildcards to select a collation, but clients
   which provide collation operations only when connected to the server
   MAY use wildcards.  If the wildcard string matches multiple
   collations, the server SHOULD select the collation with the broadest
   scope (preferably international scope), the most recent table
   versions and the greatest number of supported operations.

3.3.  Ordering Direction

Spencer: this is at the edge of a nit, but "collation-order" and
"collation-sel" haven't been introduced previously, and I'm having to guess
that "sel" is short for "selection", or something. Mumble.

   When used as a protocol element for ordering, the collation name MAY
   be prefixed by either "+" or "-" to explicitly specify an ordering
   direction.  As mentioned previously, "+" has no effect on the
   ordering function, while "-" negates the result of the ordering
   function.  In general, collation-order is used when a client requests
   a collation, and collation-sel is used when the server informs the
   client of the selected collation.

4.2.1.  Equality

Spencer: I'm confused here (note the trend :-). Is the following text
saying, "MAY return either "error" or "no-match" if the input strings are
not valid character strings ..."? The current text doesn't seem to say what
happens when the input strings aren't valid and the equality function
doesn't return "error", which is only a MAY strength ("so don't be surprised
when your server does this").

   The equality function always returns "match" or "no-match" when
   supplied valid input, and MAY return "error" if the input strings are
   not valid character strings or violate other collation constraints.

4.2.2.  Substring

Spencer: the following text requiring the ending offset seems inconsistent
with 5.2, which (as I understand it) allows either the ending offset OR the
length to be returned. If they ARE inconsistent, I'd much rather see 4.2.2
prevail, because I don't feel good about telling application developers that
sometimes they may get (10, 15) that means "six characters/octets long" and
other times they may get (10, 15) which means "15 characters/octets long".

   Application protocols MAY return position information for substring
   matches.  If this is done, the position information SHOULD include
   both the starting offset and the ending offset in the string.

4.3.  Internal Canonicalization Algorithm

Spencer: I don't believe that "The output of the canonicalization algorithm
MAY have no meaning to a human" is an upper-case MAY - not a requirement.

   A collation specification MUST describe the internal canonicalization
   algorithm.  This algorithm can be applied to individual strings and
   the result strings can be stored to potentially optimize future
   comparison operations.  A collation MAY specify that the
   canonicalization algorithm is the identity function.  The output of
   the canonicalization algorithm MAY have no meaning to a human.

7.1.  Collation Registration Procedure

Spencer: I'm not trying to change existing practice, but the IESG is having
enough fun reviewing appeals these days that if the appeal track started
with the APPS area directors, I'm sure that the other ADs would be thrilled.
:-(

   The IETF will create a mailing list, collation@ietf.org, which can be
   used for public discussion of collation proposals prior to
   registration.  Use of the mailing list is encouraged but not
   required.  The actual registration procedure will not begin until the
   completed registration template is sent to iana@iana.org.  The IESG
   will appoint a designated expert who will monitor the
   collation@ietf.org mailing list and review registrations forwarded
   from IANA.  The designated expert is expected to tell IANA and the
   submitter of the registration within two weeks whether the
   registration is approved, approved with minor changes, or rejected
   with cause.  When a registration is rejected with cause, it can be
   re-submitted if the concerns listed in the cause are addressed.
   Decisions made by the designated expert can be appealed to the IESG
   and subsequently follow the normal appeals procedure for IESG
   decisions.

9.2.1.  ASCII Casemap Collation Description

Spencer: the following text really clarified the text describing ACAP and
Sieve previously - use this sentence in that section as well?

   For historical reasons, in the context of ACAP and Sieve, the name
   "i;ascii-casemap" is a synonym for this collation.

9.5.1.  Octet Collation Description

Spencer: Ouch! is there a less ambiguous naming set than "first string" and
"second string"? I'm almost sure I've also used programming languages that
thought the first string was the search target, so it took me a second to
grok that the second string was the search target. If I'm the only one who
is confused, that's not a problem.

   The substring function returns "match" if the first string is the
   empty string, or if there exists a substring of the second string of
   length equal to the length of the first string which would result in
   a "match" result from the equality function.  Otherwise the substring
   function returns "no-match".