Document: draft-newman-i18n-comparator-13.txt
Reviewer: Spencer Dawkins [spencer@mcsr-labs.org]
Review Date:  Tuesday 8/15/2006 7:32 AM CST
IESG Telechat Date:  Thursday, 17 August 2006

This is a re-review, my previous review was for 06, with Scott as 
shepherding AD, before IETF 65. I'm  reading the deltas from 06 (in the 
spirit of not finding new problems with previously-reviewed text).

Summary: Again, nearly ready for publication as Proposed Standard, with some 
(new) items that do need to be addressed before publication.

Review Comments:

2.2.  Purpose

   Collations abstraction layer for comparison functions so that these
   comparison functions can be used in multiple protocols.

I am just barely able to parse this sentence so that it's not a sentence 
fragment. I think the problem is that "functions" is being used as a verb 
and as a noun in the same sentence. I saw later in the document that you had 
changed "function"-the-noun to "operation", so should be easy to fix. But 
this isn't an editorial comment, because I'm not sure what the sentence is 
saying.

4.2.2.  Equality
    ...
    In this specification, the return values of the equality test are
    called "match", "no-match" and "undefined".  This is not a
    specification, merely a choice of phrasing.

What does the last sentence mean? (Brian Carpenter asked me, so he doesn't 
know, either).

5.2.  Operations

...

   Although the collation's substring function provides a list of
   matches, a protocol need not provide all that to the client.  It may
   provide only the first matching substring, or even just the
   information that the substring search matched.

Hmmm. I am trying to remember that you're not defining a protocol, only 
describing what protocols do and don't do, but I'm trying to read this from 
the application's perspective, and having a hard time understanding how (for 
example) an application that is trying to display what is matching responds 
when the protocol only provides an indication that something matched. You 
may say this is what the protocol developers are supposed to worry about 
("if you think applications will want to display what matches, you'd better 
define the protocol so that this information is returned"), and that's OK. 
I'm just struggling a bit here.

6.  Use by Existing Protocols

...

   IMAP [16] also collates, although that is explicit only when the
   COMPARATOR [18] extension is used.  The built-in IMAP substring
   operation and the ordering provided by the SORT [17] extension may
   not meet the requirements made in this document.

   Other protocols may be in a similar position.

   In IMAP, the default collation is i;ascii-casemap, because its
   operations most closely resembles IMAP's built-in operations.

EDITORIAL: I'm guessing that the previous paragraph should be moved up one? 
At the very least, I'm confused because I'm not sure if the top paragraph in 
this extract describes the differences between i;ascii-casemap and IMAP's 
built-in operations or is talking about something else.

9.1.1.  ASCII Numeric Collation Description

   The "i;ascii-numeric" collation is a simple collation intended for
   use with arbitrary sized unsigned decimal integer numbers stored as
   octet strings.  US-ASCII digits (0x30 to 0x39) represent digits of
   the numbers.  Before converting from string to integer, the input
   string is truncated at the first non-digit character.  All input is
   valid; strings which do not start with a digit represent positive
   infinity.

Is it obvious to everyone except me that leading zeros are ignored? The 
examples giving a little further down say so - is making this point in 
examples normative enough?

9.2.1.  ASCII Casemap Collation Description

...

   The i;ascii-casemap collation is well suited to to use with many
   internet protocols and computer languages.  Use with natural language
   is often inappropriate: even though the collation apparently supports
   languages such as Italian and English, in real-world use it tends to
   stumble over words such as "naive", names such as "Llwyd", people and
   place names containing non-ASCII, euro and pound sterling symbols,
   quotation marks, dashes/hyphens, etc.

OK, this may be inadvertantly funny - are "naive" and "Llwyd" supposed to 
include a non-ascii character, or is that sentence saying something else? 
(Welcome to the world of the RFC Editor)

13.  Open Issues

    ... adding a
    note to the RFC editor to possibly replace the 3066 reference

>From Brian: Surely this needs to be done?

>From Spencer: I'm thinking that the "checking the SP SP "1" SP SP string for 
correctness" also needs to be done pretty soon :-0