Document: draft-wilde-text-fragment-06
Reviewer: Spencer Dawkins
Review Date:  2007-02-19
IETF LC End Date: 2007-03-14

Summary:

This document is almost ready for publication as a Proposed Standard RFC. 
Most of my questions below involve MAY/SHOULD/MUST requirements.

Comments:

I also included some (Nit)s, which are not part of the Gen-ART review but 
may be helpful for editors later in the process.

Thanks,

Spencer

1.1.  What is text/plain?

   The biggest advantage of text/plain MIME entities is their ease of
   use and their portability among different platforms.  As long as they
   use popular character encodings (such as US-ASCII or UTF-8), they can
   be displayed and processed on virtually every computer system.  The
   only remaining interoperability issue is the representation of line
   endindings, which is discussed in Section 4.1.

Spencer (Nit): s/endind/end/

2.  Fragment Identification Methods

   The identification of fragments of text/plain MIME entities can be
   based on different foundations.  Since it is not possible to insert
   explicit, invisible identifiers into a text/plain MIME entity (as for
   example used in HTML documents, implemented through dedicated
   attributes), fragment identification has to rely on certain inherent
   properties of the MIME entity.  This memo specifies fragment
   identification using six different methods, which are character
   positions and ranges, line positions and ranges, regular expression
   matching, and a mechanism for improving the robustness of fragment

Spencer (Nit): I count five methods, plus the mechanism, which doesn't seem 
to actually identify a fragment.

   identifiers (entity hashes).

2.2.1.  Character Position

   To identify a character position (i.e., a fragment of length zero
   between two characters), the 'char' scheme followed by a single
   number is used.  Rather than identifying a fragment consisting of a

Spencer (Clarity): at least a couple of times, a description starts out 
"Rather than X, Y", and I found this confusing. I'd prefer to see "Y, rather 
than X", if this makes sense to the authors.

   number of characters, this method identifies a position between two
   characters (or before the first or after the last character).
   Character position counting starts with 0, so the character position
   before the first character of a text/plain MIME entity has the
   character position 0, and a MIME entity containing n distinct
   characters has n+1 distinct character positions, the last one having
   the character position n.

2.5.  Fragment Identifier Robustness

   Hash sums may specify the character encoding that has been used when
   creating the hash sums, and if such a specification is present,
   clients MUST check whether the character encoding specified for the
   hash sum and the character encoding of the retrieved MIME entity are
   equal, and clients MUST NOT check the hash sum if these values
   differ.  However, clients MAY choose to transcode the retrieved MIME
   entity in the case of differing character encodings, and after doing
   so, check the hash sum.  Please note that this method is inhererently
   unreliable, because certain characters or character sequences may
   have been lost or normalized due to restrictions in one of the
   character encodings used.

Spencer: I have a concern about using MAY to allow clients to check 
reliability in an inherently unreliable way. I would prefer at least SHOULD 
NOT.

3.  Fragment Identification Syntax

   The syntax for the fragment identifiers is straightforward.  The
   syntax defines four schemes, 'char', 'line', 'match', and hash (which
   can either be 'length' or 'md5').  The 'char' and 'line' schemes can
   be used in two different variants, either the position variant (with
   a single number), or the range variant (with two comma-separated
   numbers).  The 'match' scheme has a regular expression as its
   parameter, which must be specified as a string with escaped
   semicolons (because the semicolon is used to concatenate multiple
   fragment identification scheme parts).  The hash scheme can either
   use the 'length' or the 'md5' scheme to specify a hash value.

Spencer: The use of the word "hash" to describe the length of a resource in 
characters violates the Principle of Least Astonishment. Could "length" and 
"md5" not be grouped together, just for ease of understanding?

   The following syntax definition uses ABNF as defined in RFC 4234 [7],
   including the rules DIGIT and HEXDIG.

4.3.  Handling of Hash Sums

   Clients are not required to implement the handling of hash sums, so
   they MAY choose to ignore hash sum information altogether.  However,
   if they do implement hash sum handling, the following applies:

   If a fragment identifier contains a hash sum, and a client retrieves
   a MIME entity and detects that the hash sum has changed (observing
   the character encoding specification as described in Section 3.2, if
   present), then the client SHOULD NOT interpret any other text/plain

Spencer: why SHOULD NOT, and not MUST NOT?

   fragment identifier scheme part.  A client MAY signal this situation
   to the user.

4.4.  Syntax Errors in Fragment Identifiers

   If a fragment identifier contains a syntax error (i.e., does not
   conform to the syntax specified in Section 3), then it MUST be
   ignored by clients.  Clients SHOULD NOT make any attempt to correct

Spencer: again, why SHOULD NOT, and not MUST NOT?

   or guess fragment identifiers.  Syntax errors MAY be reported by
   clients.

5.  Examples

   The following examples show some usages for the fragment identifiers
   defined in this memo.

Spencer: this section is very helpful. Thank you for including it.

   ftp://example.com/text.txt#line=10,20;length=9876,UTF-8

   As in the second example, this URI identifies lines 11 to 20 of the
   text.txt MIME entity.  The additional length hash sum specifies that
   the MIME entity has a length of 9876 characters when encoded in
   UTF-8.  If the client supports the length hash sum scheme, it may
   test the retrieved MIME entity for its length, but only if the
   retrieved MIME entity uses the UTF-8 encoding or has been locally
   trancoded into this encoding.  If the length of the retrieved MIME
   entity does not match the length specified in the fragment
   identifier, the client SHOULD NOT interpret the line part and MAY
   signal this to the user.

Spencer: this is the only example description that also includes normative 
text, which I believe is redundant anyway. I'd remove the last sentence from 
the description.