Document: draft-ietf-eai-utf8headers-09.txt
Reviewer: Spencer Dawkins
Review Date:  2008-03-22
IETF LC End Date: 2008-03-24
IESG Telechat date: (if known)
Summary: This draft is on the right track for publication as an Experimental RFC.

There are issues in the following review identified as "technical:" that need to be looked at.

Comments identified as "clarity:" are probably nits, but affected the meaning enough that I wanted to include them in the review.

Comments identified as "nit:" are not part of the review but are provided for editor convenience.

From idnits 2.08.04:

  Checking boilerplate required by RFC 3978 and 3979, updated by RFC 4748:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to http://www.ietf.org/ietf/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to http://www.ietf.org/ID-Checklist.html:
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords -- however, there's a paragraph with
     a matching beginning. Boilerplate error?

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).

  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'CFWS' is mentioned on line 338, but not defined

  -- Possible downref: Undefined Non-RFC (?) reference : ref. 'CFWS'

  == Unused Reference: 'ASCII' is defined on line 606, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ASCII'

  == Outdated reference: A later version (-07) exists of
     draft-ietf-eai-downgrade-05

  == Outdated reference: A later version (-09) exists of
     draft-klensin-net-utf8-07


Comments:

1.  Introduction

1.1.  Role of this specification

   Full internationalization of electronic mail requires several
   capabilities:

   o  The capability to transmit non-ASCII content, provided for as part
      of the basic MIME specification [RFC2045], [RFC2046].
   o  The capability to express those addresses, and information related

Clarity: which addresses are "those addresses"? This is the first use of "addresses in the document. Guessing that the bullets were reordered without taking this into account - the third bullet says "envelope addresses".

      to and based on them, in mail header fields, defined in this

Nit: "related to and based on" is correct but doesn't parse easily. Suggest s/related to/related to them/ or something similar.

      document.  And, finally,
   o  The capability to use international characters in envelope
      addresses, discussed in [RFC4952] and specified in
      [EAI-SMTP-extension].

   This document specifies an experimental variant of Internet mail that
   permits the use of Unicode encoded in UTF-8 [RFC3629], rather than
   ASCII, as the base form for Internet email header fields.  This form
   is permitted in transmission, if authorized by the SMTP extension
   specified in [EAI-SMTP-extension] or by other transport mechanisms

Technical: isn't this s/transport/transfer/?

   capable of processing it.

1.2.  Relation to other standards

   This document also updates [RFC2822] and MIME, and the fact that an
   experimental specification updates a standards-track spec means that
   people who participate in the experiment have to consider those
   standards updated.

Process: The ID Tracker is showing this draft in Last Call status, but I can't find (in the archive or in my personal folders) any Last Call announcement, which I was looking for, in order to check how Chris explained the downref at Last Call time - I'm expecting that it will be quite entertaining. Has anyone else seen such an announcement on IETF Announce?

2.  Background and History

   The traditional format of email messages [RFC2822] allows only ASCII
   characters in the header fields of messages.  This prevents users
   from having email addresses that contain non-ASCII characters.  It
   further forces non-ASCII text in common names, comments, and in free
   text (such as in the Subject: field) to be encoded (as required by
   MIME format [RFC2047]).  This specification describes a change to the
   email message format that is related to the SMTP message transport

technical: there seem to be multiple occurrances of "transport" that should be "transfer"... I won't flag them, but please /

   change described in the associated document [RFC4952] and
   [EAI-SMTP-extension], and that allows non-ASCII characters in most
   email header fields.  These changes affect SMTP clients, SMTP
   servers, mail user agents (MUAs), list expanders, gateways to other
   media, and all other processes that parse or handle email messages.

   Use of this SMTP extension helps prevents the introduction of such
   messages into message stores that might misinterpret, improperly
   display, or mangle such messages.  It should be noted that using an
   ESMTP extension does not prevent transfering email messages with
   UTF-8 header fields to other systems that use the email format for
   messages and that may not be upgraded, such as unextended POP and
   IMAP servers.  Changes to these protocols to handle UTF-8 header
   fields are addressed in related documents.

technical: I would expect to see references to the "related documents"
here... if they haven't been written yet, just saying "will be addressed"
would make sense.

3.  Terminology

   Unless otherwise noted, all terms used here are defined in [RFC2821]
   ,[RFC2822] , [RFC4952], or [EAI-SMTP-extension].

nit: should not have spaces before commas here.

   The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED",
   and "MAY" in this document are to be interpreted as described in
   [RFC2119].

4.  Changes on Message Header Fields

   This protocol does NOT change the definition of header field names.

technical: I'm confused here. Is this text saying "does not change header field names"? I would have thought this specification is exactly changing the definition of header field names...

   That is, only the bodies of header fields are allowed to have UTF-8
   characters; the rules in [RFC2822] for header field names are not
   changed.

   To permit UTF-8 characters in field values, the header definition in
   [RFC2822] must be extended to support new format.  The following ABNF
   is defined to substitute those definition in [RFC2822].

clarity: is this "replaces the corresponding definitions in [RFC2822]"?

   Those syntax rules not referred to this section remain as the

clarity: s/to this/to in this/

   original definition in [RFC2822].

4.1.  UTF8 Syntax and Normalization

   These are taken from [RFC3629], but kept in this document for reasons
   of convenience.

clarity: this might be "These are normatively defined in [RFC3629], but..."

   This specification does not require a specific normalization of the
   Unicode strings, but recommends that good practices for normalization
   be followed.  See [Net-UTF8] for a discussion on recommended

clarity: s/discussion/guidance/

   practices for normalizing text before sending.

technical: this sounds like [Net-UTF8] is on the border between Normative and Informative. You have it under Informative, which may be fine, just please consider whether you expect someone using this specification to also use [Net-UTF8] in order to interoperate.

4.2.  Changes on MIME headers

   Background: Normally, transfer of message/global will be done in
   8-bit-clean channels, and body parts will have "identity" encodings,
   that is, no decoding is necessary.  In the case where a message
   containing a message/global is downgraded from 8-bit to 7-bit as
   described in [RFC1652]., an encoding may be applied to the message;
   if the message travels multiple times between a 7-bit environment and

clarity: "if the message crosses a boundary between a 7-bit environment and an envitonment implementing UTF8SMTP multiple times..." (if I understand correctly, you're not talking about "traveling between" the same two environments multiple times - right?)

   an environment implementing UTF8SMTP, multiple levels of encoding may
   occur.  This is expected to be rarely seen in practice, and the
   potential complexity of other ways of dealing with the issue are
   thought to be larger than the complexity of allowing nested encodings
   where necessary.

4.3.  Syntax extensions to RFC 2822

   The following rules are intended to extend the corresponding rules in

clarity: "The following rules extend ..."

   [RFC2822] to allow UTF8 characters.

   This means that all the [RFC2822] constructs that build upon these
   will permit UTF-8 characters, including comments and quoted strings.

clarity: this paragraph should probably be split here, because it's making two different points.

   Besides, in order to allow UTF8 characters in <addr-spec> we have to
   change the syntax of <atext>.  However, it would also lead
   <message-id> to allow UTF8 characters, which is not allowed due to
   the limitation described in Section 4.5.  So <utf8-atext> is added to
   meet this requirement.

clarity: this text seems to be written from the protocol writer's perspective. It might be clearer to protocol readers if it said "We do not change the syntax of <atext> in order to allow UTF8 characters in <addr-spec>, because this would also allow UTF8 characters in <message-id>, which is not allowed due to the limitation described in Section 4.5.
Instead, <utf8-atext> is added to meet this requirement."

   Note, however, this does not remove any constraint on the character
   set of protocol elements; for instance, all the allowed values for
   timezone in the Date: headers are still expressed in ASCII.  And
   also, none of this revised syntax affects what is allowed in a

clarity: perhaps s/affects/changes/

   <message-id>, which will still remain in pure ASCII.

4.4.  Change on addr-spec syntax

   Internationalized email addresses are represented in UTF-8.  Thus,
   all header fields containing <mailbox>es are updated to permit UTF-8
   as well as an additional, optional all-ascii alternate address.  Note
   that MSAs and MTAs may downgrade internationalized messages as
   needed.  The procedure for doing so in described in

nit: s/so in/so is/

   [EAI-downgrading].

   Below list a few possible <mailbox> representation as example.

nit: "The following list shows a few possible <mailbox> representations as examples."

      "DISPLAY_NAME" <ASCII@ASCII>
         ; traditional mailbox format

      "DISPLAY_NAME" <non-ASCII@non-ASCII>
         ; UTF8SMTP but no ALT-ADDRESS parameter provided,
         ; message will bounce if UTF8SMTP extension is not supported

      <non-ASCII@non-ASCII>
             ; without DISPLAY_NAME and quoted string
             ; UTF8SMTP but no ALT-ADDRESS parameter provided,
         ; message will bounce if UTF8SMTP extension is not supported

      "DISPLAY_NAME" <non-ASCII@non-ASCII <ASCII@ASCII>>
         ; UTF8SMTP with ALT-ADDRESS parameter provided,
         ; ALT-ADDRESS can be used if downgrade is necessary

clarity: since you show a "no DISPLAY_NAME" example for non-ASCII@non-ASCII, it might be helpful to show two more "no DISPLAY_NAME" examples for the traditional format, and for UTF8SMTP with ALT-ADDRESS. I'm an extremist on symmetry, of course, so do what seems appropriate to you.

4.5.  Trace field syntax

   The "Return-Path" header provides the email return address in the
   mail delivery.  Thus, it MUST able to carry UTF8 addresses (see the

technical: this isn't a 2119 MUST, as written (any requirement would be on an implementation, not on a header). I'd suggest changing to "is augmented to carry", which matches the phrasing you use for other headers below.

   revised syntax of <angle-addr> in Section 4.4 of this document).
   This will not break the rule of trace fied integrity, because it is

technical: I'm not a mail expert, but is "trace fied integrity" correct? 
This draft has the only use of this term that Google finds on the IETF website ;-)

   added at the last MTA.

   <item-value> on "Received:" syntax is augmented to allow UTF-8 email
   address on "For" clause. <angle-addr> is augmented to include UTF-8
   email address on previous chapter.  To allow UTF-8 email address also
   on syntax corresponding of <addr-spec> on original syntax, <utf8-
   addr-spec> is added to <item-value>.

clarity: you lost me here. Guessing "We add <utf8-addr-spec> to <item-value>, corresponding to <addr-spec> in current syntax, to allow UTF-8 email addresses."

   item-value      =/      utf8-addr-spec


4.6.  message/global

technical: "message/global" doesn't seem particularly obvious. If this is really Experimental, I'd suggest message/rfcXXXX, or something that gives people more of a clue about what the subtype means.

   The type message/global is similar to message/rfc822, except that it
   contains a message that can contain UTF-8 characters in the headers
   of the message or body parts. headers.  If this type is sent to a

clarity (I think): s/body parts.headers/body parts/

   7-bit-only system, it has to be encoded in [RFC2045].  Note that a
   system compliant with MIME that doesn't recognize message/global
   would treat it as "application/octet-stream" as described in Section
   5.2.4 of [RFC2046].

   Interoperability considerations:  The media type provides
      functionality similar to the message/rfc822 content type for email
      messages with international email headers.  When there is a need
      to embed or return such content in another message, there is
      generally an option to use this media type and leave the content
      unchanged or downconvert the content to message/rfc822.  Both of
      these choices will interoperate with the installed base, but with
      different properties.  Systems unaware of international headers
      will typically treat a message/global body part as an unknown
      attachment, while they will understand the structure of a message/
      rfc822.  However, systems which understand message/global will
      provide functionality superior to the result of a down-conversion
      to message/rfc822.  The most interoperable choice depends on the
      deployed software.

technical: not sure what the last sentence actually means. "We don't know what the most interoperable choice will be"? Text in the same paragraph says both choices are interoperable. If that text is correct, I don't understand what you're saying here.

   Published specification:  RFC XXXX

nit: it's actually safer to put a note saying "Note to RFC Editor: please replace XXXX with the RFC number when assigned" in the draft when you use this mechanism.

   Macintosh file type code(s):  A uniform type identifier (UTI) of
      "public.utf8-email-message" is suggested.  This conforms to
      "public.message" and "public.composite-content" but does not
      necessarily conform to "public.utf8-plain-text".

technical: out of my league here, but "does not necessarily conform to"
doesn't seem helpful. Could you provide any details that would help the reader understand why not?

5.  Security Considerations

   Because UTF-8 often requires several octets to encode a single
   character, internationalized local parts may cause mail addresses to
   become longer.  As specified in [RFC2822], each line of characters
   MUST be no more 998 octets, excluding the CRLF.

clarity: s/CRLF/CRLF, even when UTF-8 characters are being used/

   Because internationalized local parts may cause email addresses to be
   longer, processes which parse, store, or handle email addresses or
   local parts must take extra care not to overflow buffers, truncate
   addresses, exceed storage allotments, or, when comparing, fail to use
   the entire length.

technical: this is great advice, but I don't understand how UTF-8 changes the situation. If you aren't changing the 998-octet requirement, software that breaks for UTF-8 would also break for ASCII headers with the same octet length.

   In this specification, a user could provide an ASCII alternative
   address for a non-ASCII address.  However, it is possible these two
   address go to different mailboxes, or even different persons.  This
   might not be a protocol problem, but instead be the user's personal
   choice or administration policy or even be a deliberate attempt to
   deceive or cause confusion.

technical: I'm not sure what the security consideration is. I'm not sure how a sender could detect whether a receiver IS deliberately attempting to deceive or cause confusion, or what a sender is supposed to do if this condition is detected.

clarity: I'm guessing, but if the last sentence was replaced with "This configuration may be based on a user's personal choice, or based on administration policy. We recognize that if ASCII and non-ASCII email is delivered to two different destinations, based on MTA capability, this may violate the principle of least astonishment, but this is not a "protocol problem".", it might be clearer.

6.  IANA considerations

   IANA is requested to register the MIME type message/global, using the
   registration data in section Section 4.6.

technical: OK, but it would be clearer if the registration data made sense when it moves to an IANA registry. There are places where the registration refers to "Section 5" for security considerations, for example. These should probably appear as "Section 5 of RFC XXXX", with a note to replace XXXX with the RFC number, when it's assigned. The author and contact information also reference sections of this draft.

7.  Acknowledgements

   Most of the content of this document is provided by John C Klensin.
   Also some significant comments and suggestions were received from
   Charles H. Lindsey, Kari Hurtta, Pete Resnick, Alexey Melnikov, Chris
   Newman, Yangwoo KO, Yoshiro YONEYA, and other members of the JET team
   and were incorporated into the document.  The editor is much great
   thanks to their contribution sincerely.

Nit: "The editor sincerely thanks them for their contributions."

9.2.  Informative References


   [Hoffman-utf8-headers]
              Hoffman, P., "SMTP Service Extensions or Transmission of
              Headers in UTF-8 Encoding",
              draft-hoffman-utf8headers-00.txt (work in progress),
              December 2003.

Technical: I know this is how we refer to Internet Drafts, but "2003" isn't "work in progress". You might s/work in progress/expired Internet Draft/, or (probably better) simply move the rest of the full citation to the Acknowledgements section - it didn't seem like you really expected anyone to actually refer to this reference, anyway :-)

   [RFC1652]  Klensin, J., Freed, N., Rose, M., Stefferud, E., and D.
              Crocker, "SMTP Service Extension for 8bit-MIMEtransport",
              RFC 1652, July 1994.

Technical: I'd think RFC 1652 would be normative - do you have to use the service extension to transmit utf8headers?