Document: draft-wilde-text-fragment-08 Reviewer: Spencer Dawkins Review Date: 28 Sept 2007 IESG Telechat date: 04 Oct 2007 Summary: This document is ready for publication as a Proposed Standard, with some whining. Comments: The authors have been very responsive in resolving most of my Last Call review comments (covering -06). Two comments still seem relevant in -08. Both involve additional explanation of this mechanism. Neither would change the way this mechanism works, so neither should be blocking. I'm sending these along for Russ's background - whatever he thinks will be fine with me. These comments follow (from a private e-mail exchange with Martin, a long time ago, before Chris joined the IESG, so he would not have seen it)... >>2.5. Fragment Identifier Robustness >> >> Hash sums may specify the character encoding that has been used when >> creating the hash sums, and if such a specification is present, >> clients MUST check whether the character encoding specified for the >> hash sum and the character encoding of the retrieved MIME entity are >> equal, and clients MUST NOT check the hash sum if these values >> differ. However, clients MAY choose to transcode the retrieved MIME >> entity in the case of differing character encodings, and after doing >> so, check the hash sum. Please note that this method is inhererently >> unreliable, because certain characters or character sequences may >> have been lost or normalized due to restrictions in one of the >> character encodings used. >> >>Spencer: I have a concern about using MAY to allow clients to check >>reliability in an inherently unreliable way. I would prefer at least >>SHOULD NOT. > > I agree that at first, this looks a bit scary, and in general, is a > bad idea. But I don't think this is a big concern in this case in practice. > The failure cases of this method are highly skewed towards false > negatives (transcoding back to what the charset information in the > fragment ID says doesn't match) as opposed to false positives (a match > despite the fact that the document has actually changed). This should > be obvious for MD5 hashes, and also applies to lenght 'hashes'. In the > lenght case, there is a basic risk of false positives independent of > character encoding anyway (the document gets changed, but with the > same exact resulting length). > > Do you agree that this can stay as is? Or do you think some wording > change would make it easier to understand that this as such isn't a > big risk? Let me suggest text, but please read it critically. "Please note that this method is inhererently unreliable, because certain characters or character sequences may have been lost or normalized due to restrictions in one of the character encodings used. Most hash value mismatches may be "false negatives" - the hash fails because of the transcoding operation, not because of a problem with the fragment identifier." >>4.3. Handling of Hash Sums >> >> Clients are not required to implement the handling of hash sums, so >> they MAY choose to ignore hash sum information altogether. However, >> if they do implement hash sum handling, the following applies: >> >> If a fragment identifier contains a hash sum, and a client retrieves >> a MIME entity and detects that the hash sum has changed (observing >> the character encoding specification as described in Section 3.2, if >> present), then the client SHOULD NOT interpret any other text/plain >> >>Spencer: why SHOULD NOT, and not MUST NOT? > > In many cases (e.g. additions to the end of a file), the fragment id > may still be valid. In other cases (e.g. small edits shifting things > by a character or two), the user still may find the right place. > So going ahead is not always completely useless, and therefore we > wanted to give implementations some leeway to do what seems to work > best in their context (e.g. an interactive application vs. > something like an automatic extractor). > >> fragment identifier scheme part. A client MAY signal this situation >> to the user. SHOULD NOT would be fine with me if you add a sentence or two explaining the risks for human-in-the-loop clients versus automatic extractor clients. Gen-ART reviewers usually aren't questioning SHOULD/NOTs, we're usually asking for help in understanding the tradeoffs in the document.