Network Working Group J. Klensin Internet-Draft Expires: April 2, 2006 Y. Ko MOCOCO, Inc. September 29, 2005 Overview and Framework for Internationalized Email draft-klensin-ima-framework-00.txt Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on April 2, 2006. Copyright Notice Copyright (C) The Internet Society (2005). Abstract Full use of electronic mail throughout the world requires that people be able to use their own names, written correctly in their own languages and scripts, as mailbox names in email addresses. This document introduces a series of specifications and operational suggestions that define mechanisms and protocol extensions needed to fully support internationalized email addresses. These changes include an SMTP extension and extension of email header syntax to Klensin & Ko Expires April 2, 2006 [Page 1] Internet-Draft IMA Framework September 2005 accommodate UTF-8 data. The document set also will include discussion of key assumptions and issues in deploying fully internationalized email. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Role of This Specification . . . . . . . . . . . . . . . . 3 1.2. Problem statement . . . . . . . . . . . . . . . . . . . . 3 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 2. Overview of the Approach . . . . . . . . . . . . . . . . . . . 5 3. Document Roadmap . . . . . . . . . . . . . . . . . . . . . . . 5 4. Overview of Protocol Extensions and Changes . . . . . . . . . 6 4.1. SMTP Extension for Internationalized eMail Address . . . . 6 4.2. Transmission of Email Header in UTF-8 Encoding . . . . . . 6 4.3. Downgrading Mechanism for Backward Compatibility . . . . . 7 5. Advice to Designers and Operators of Mail-receiving Systems . 7 6. Internationalization Considerations . . . . . . . . . . . . . 8 7. Additional Issues . . . . . . . . . . . . . . . . . . . . . . 8 7.1. Impact to IRI . . . . . . . . . . . . . . . . . . . . . . 8 7.2. POP and IMAP . . . . . . . . . . . . . . . . . . . . . . . 8 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 9. Security Considerations . . . . . . . . . . . . . . . . . . . 9 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9 11. Change History . . . . . . . . . . . . . . . . . . . . . . . . 10 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10 12.1. Normative References . . . . . . . . . . . . . . . . . . . 10 12.2. Informative References . . . . . . . . . . . . . . . . . . 11 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 13 Intellectual Property and Copyright Statements . . . . . . . . . . 14 Klensin & Ko Expires April 2, 2006 [Page 2] Internet-Draft IMA Framework September 2005 1. Introduction In order to use internationalized email addresses, we need to internationalize both domain part and local part of email address. The domain part of email addresses is already internationalized [RFC3490], while the local part is not. Without these extensions, the mailbox name is restricted to a subset of 7-bit ASCII in [RFC2821]. Though MIME enables the transport of non-ASCII data, it does not provide a mechanism for internationalized email address. [RFC2047] defines an encoding mechanism for some specific message header fields to accommodate non-ASCII data. However, it does not address the issue of email addresses that include non-ASCII characters. 1.1. Role of This Specification This document presents the overview and framework for an approach to the next stage of email internationalization. This new stage requires not only internationalization of addresses and headers, but also associated transport and delivery models. The history of developments and design ideas leading to this specification is described in [IMA-history]. This document describes how the various elements of email internationalization fit together and provides a roadmap for navigating the various documents involved. 1.2. Problem statement [[anchor1: Note in draft: this section needs very significant reworking for both content and presentation. Changed with -01c, but may still not be good enough]] Though domain names are already internationalized, the internationalized forms are far from general adoption by ordinary users. One of the reasons for this is that we do not yet have fully internationalized naming schemes. Domain names are just one of the various names and identifiers that are required to be internationalized. Email addresses are a particularly important example of where internationalization of domain names alone is not sufficient. Unless email addresses are presented to the user in familiar characters and formats, the user's perception will not be of internationalization and behavior that is culturally friendly. One thing most of us have almost certainly learned from the experience with email usage is that users strongly prefer email addresses that closely resemble names or initials to those involving. If the names or initials of the names Klensin & Ko Expires April 2, 2006 [Page 3] Internet-Draft IMA Framework September 2005 in the email address is expressed in their native languages, which will be very good news to those whose native language is not written in a subset of a Roman-derived script. Internationalization of email addresses is not merely a matter of changing the SMTP envelope, or of modifying the From, To, and Cc headers, or of permitting upgraded mail user agents (MUA) to decode a special coding and display local characters. To be perceived as usable by end users, the addresses must be internationalized, and handled consistently, in all of the contexts in which they occur. That requirement has far-reaching implications: collections of patches and workarounds are not adequate. Instead, we need to build a fully internationalized email environment, focusing on permitting efficient communication among those who share a language or other community. That, in turn, implies changes to the mail header environment to permit the full range of Unicode characters where that makes sense, an SMTP extension to permit UTF-8 mail addressing and delivery of those extended headers, and (finally) a requirement for support of the 8BITMIME option so that all of this can be transported through the mail system without having to overcome the limitation that headers not have content-transfer-encodings. 1.3. Terminology This document assumes a reasonable understanding of the protocols and terminology of the core email standards as documented in [RFC2821] and [RFC2822]. Much of the description in this document depends on the abstractions of "Mail Transfer Agent" ("MTA") and "Mail User Agent" ("MUA"). However, it is important to understand that those terms and the underlying concepts postdate the design of the Internet's email architecture and the "protocols on the wire" principle. That email architecture, as it has evolved, and the "wire" principle have prevented any strong and standardized distinctions about how MTAs and MUAs interact on a given origin or destination host (or even whether they are separate). In this document, an address is "all-ASCII" if every character in the address is in the ASCII character repertoire [ASCII]; an address is "non-ASCII" if any character is not in the ASCII character repertoire. The term "all-ASCII" is also applied to other protocol elements when the distinction is important, with "non-ASCII" or "internationalized" as its opposite. The term "internationalized email address", or "IMA", refers to an address permitted by this specification. [[anchor3: Note in Draft/ Placeholder: it appears that the term "IMA" is not used in a precise Klensin & Ko Expires April 2, 2006 [Page 4] Internet-Draft IMA Framework September 2005 and consistent way across the document set. It is sometimes used to refer simply to a "non-ASCII" address; sometimes to an address that contains non-ASCII characters, even if that address is encoded into ASCII characters (i.e., as an ACE); and sometimes as an address that may contain non-ASCII characters but may also be a traditional adress. The definition needs to be clarified in an upcoming draft and all uses of the term brought into line with the definition.]] The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and "MAY" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 2. Overview of the Approach This set of specifications changes both SMTP and the format of email headers to permit non-ASCII characters to be represented directly. Each important component of the work is described in a separate document. The document set, whose members are described in the next section, also contains informational documents whose purpose is to provide operational and implementation suggestions and guidance for the protocols. 3. Document Roadmap In addition to this document, the following documents make up this specification and provide advice and context for it. o SMTP extensions. This document provides an SMTP extension for internationalized addresses, as provided for in RFC 2821 [IMA- SMTPext]. o Email headers in UTF-8. This document essentially updates RFC 2822 to permit some information in email headers to be expressed directly by Unicode characters encoded in UTF-8 when the SMTP extension is used [IMA-UTF8]. o Downgrading from internationalized addressing with the SMTP extension and UTF-8 headers to traditional email formats and characters [IMA-downgrade]. o Operational guidelines and suggestions for the deployment of internationalized email [IMA-ops]. o Special considerations for mailing lists and similar distributions during the transition to internationalized email [IMA-Exploder]. o Design decisions, history, and alternative models for internationalized Internet email [IMA-history]. Klensin & Ko Expires April 2, 2006 [Page 5] Internet-Draft IMA Framework September 2005 4. Overview of Protocol Extensions and Changes 4.1. SMTP Extension for Internationalized eMail Address An SMTP extension, "IMA" is specified that o Permits the use of UTF-8 strings in email addresses, both local parts and domain names o Permits the selective use of UTF-8 strings in email headers (see the next subsection) o Requires support for the 8BITMIME extension so that header information can be transmitted without using a special content- transfer-encoding. Some general principles apply to this work. 1. Whatever encoding is used should apply to the whole address and be directly compatible with software used at the user interface. 2. An SMTP relay must * Either recognize the format explicitly, agreeing to do so via an ESMTP option, * Select and use an ASCII-only address, or * Bounce the message so that the sender can make another plan. 3. In the interest of interoperability, charsets other than UTF-8 are prohibited. There is no practical way to identify them properly with an extension like this without introducing great complexity. 4.2. Transmission of Email Header in UTF-8 Encoding [[anchor8: Note in Draft: Much better than earlier version and good enough for now. It could still benefit from a further rework in -01.]] There are many places in MUAs or in user presentation in which email addresses or domain names appear. Examples include the conventional From, To, or Cc header fields; Message-IDs; In-Reply-To fields that may contain addresses or domain names; in message bodies; or elsewhere. We must examine all of them from an internationalization perspective. The user will expect to see mailbox and domain names in local characters, and to see them consistently. Variations on that problem will exist with any internationalization method, whether transport or MUA-only in structure. Perhaps, if we have to live with it for a short time as a transition activity, that is worthwhile. But the only practical way to avoid it, in both the medium and the longer term, is to have the encodings used in transport be as nearly as possible the same as the encodings used in message headers and message bodies. It seems clear that the point at which email local parts are internationalized is the point that email headers should simply be shifted to a full internationalized form, presumably using UTF-8 Klensin & Ko Expires April 2, 2006 [Page 6] Internet-Draft IMA Framework September 2005 rather than ASCII as the base character set for other than protocol elements such as the header field names themselves. The transition to that model includes support for address, and address-related, fields within the headers of legacy systems. This is done by extending the encoding models of [RFC2045] and [RFC2231]. However, our target should be fully internationalized headers, as discussed [IMA-UTF8]. 4.3. Downgrading Mechanism for Backward Compatibility As with any use of the SMTP extension mechanism, there is always a possibility of a client that requires the feature encountering a server that does not. In the case of IMA, the risk should be minimized by the fact that the selection of submission servers are presumably under the control of the client and the selection of potential intermediate relays is under the control of the administration of the final delivery server. For those situations, there are basically two possibilities: o Reject or bounce the message, requiring the sender to resubmit it with traditional-format addresses and headers. o Figure out a way to downgrade the envelope or message body in transit. Especially when internationalized addresses are involved, downgrading will require either that an all-ASCII address be obtained from some source or computed. An optional extension parameter is provided as a way of transmitting an alternate address. Computing an ASCII form of an IMA address requires that the sender have some knowledge that is normally restricted to final delivery servers, but save extensions may be feasible there too. Downgrade issues and a specification are discussed in [IMA-downgrade]. The first of these two options, that of rejecting or returning the message to the sender MAY always be chosen. There is also a third case, one in which the client is IMA-capable, the server is not, but the message does not require the extended capabilities. In other words, both the addresses in the envelope and the entire set of headers of the message are entirely in ASCII (perhaps including encoded-words in the headers). In that case, the client SHOULD send the message whether or not the server announces the IMA capability. 5. Advice to Designers and Operators of Mail-receiving Systems [[anchor10: Note in draft: The material that follows contains some forward-looking, predictive, statements. Be sure they are true Klensin & Ko Expires April 2, 2006 [Page 7] Internet-Draft IMA Framework September 2005 before Last Call.]] In addition to the protocol specification materials in this set of documents, the working group has had extensive discussions about operational considerations in the use of internationalized addresses. Those topics include how such addresses should be chosen, how they should relate to ASCII alternatives if such alternatives exist, the management of mailing lists that might support and contain a mixture of all-ASCII and non-ASCII addresses, and so on. Those issues are discussed in [IMA-ops] and [IMA-Exploder]. 6. Internationalization Considerations This entire specification addresses issues in internationalization and especially the boundaries between internationalization and localization and between network protocols and client/user interface actions. 7. Additional Issues This section identifies issues that are not covered as part of this set of specifications, but that will need to be considered as part of IMA deployment. 7.1. Impact to IRI The mailto: schema in IRI [RFC3987] may need to be modified when IMA is standardized. 7.2. POP and IMAP While SMTP takes care of the transportation of messages, IMAP [RFC3501] and POP3 [RFC1939] are among mechanisms used to handle the retrieval of mail objects from a mail store by a client. The use of internationalized mail addresses or UTF-8 headers will require extensions to POP and IMAP and/or modifications to the design and implementation of mail stores and the mechanisms that final delivery SMTP servers use to put mail into them. However, those mechanisms are separate from those associated with transport across the network and are not discussed in this series of documents. The general issues are covered in [IMA-imap-pop]. 8. IANA Considerations This specification does not contemplate any IANA registrations or Klensin & Ko Expires April 2, 2006 [Page 8] Internet-Draft IMA Framework September 2005 other actions. 9. Security Considerations Any expansion of permitted characters and encoding forms in email addresses raises some risks. There have been discussions on so called "IDN-spoofing". IDN homograph attacks allow an attacker/ phisher to spoof the domain/URLs of businesses. The same kind of attack is also possible on the local part of internationalized email addresses. It should be noted that one of the proposed fixes for, e.g., URLs, does not work for email local parts since they are case- sensitive. That fix involves forcing all elements that are displayed to be in lower-case and normalized, Since email addresses are often transcribed from business cards and notes on paper, they are subject to problems arising from confusable characters. These problems are somewhat reduced if the domain associated with the mailbox is unambiguous and supports a relatively small number of mailboxes whose names follow local system conventions; they are increased with very large mail systems in which users can freely select their own addresses. The internationalization of email addresses and headers must not leave the Internet less secure than it is that without the required extensions. The requirements and mechanisms documented in this set of IMA specifications do not, in general, raise any new security issues other than those associated with confusable characters -- a topic that is being explored thoroughly elsewhere. [[anchor16: Note in Draft: If the IAB-IDN report is completed and published, a reference to it should go here.]] Specific issues are discussed in more detail in the other documents in this set. However, in particular, caution should be taken that any "downgrading" mechanism, or use of downgraded addresses, does not inappropriately assume authenticated bindings between the IMA and ASCII addresses. In addition, email addresses are used in many contexts other than sending mail, such as for identifiers under various circumstances. Each of those contexts will need to be evaluated, in turn, to determine whether the use of non-ASCII forms is appropriate and what particular issues they raise. 10. Acknowledgements This document, and the related ones, were originally derived from drafts by John Klensin and the JET group [Klensin-emailaddr], [JET- IMA]. The work drew inspiration from discussions on the "IMAA" Klensin & Ko Expires April 2, 2006 [Page 9] Internet-Draft IMA Framework September 2005 mailing list, sponsored by the Internet Mail Consortium and especially from an early draft by Paul Hoffman and Adam Costello [Hoffman-IMAA] that attempted to define an MUA-only solution to the IMA problem. [[anchor18: Note in draft: may want to move some of this to "history" or reference it]] 11. Change History [[anchor20: Note to RFC Editor: this section to be removed prior to publication]] Version 00 This version supercedes draft-lee-jet-ima-00 and draft-klensin-emailaddr-i18n-03. It represents a major rewrite and change of architecture from the former and incorporates many ideas and some text from the latter. 12. References 12.1. Normative References [ASCII] American National Standards Institute (formerly United States of America Standards Institute), "USA Code for Information Interchange", ANSI X3.4-1968, 1968. ANSI X3.4-1968 has been replaced by newer versions with slight modifications, but the 1968 version remains definitive for the Internet. [IMA-Exploder] "Placeholder: whatever we call the mailing list document", 2005. [IMA-SMTPext] Yao, J., Ed., "SMTP Extension for Internationalized Email Address", draft-yao-smtpext-00 (work in progress), September 2005. [IMA-UTF8] Yeh, J., "Transmission of Email Headers in UTF-8 Encoding", draft-yeh-ima-utf8headers-00 (work in progress), October 2005. [IMA-downgrade] YONEYA, Y., Ed., "Placeholder: whatever we call the downgrading document", October 2005. Klensin & Ko Expires April 2, 2006 [Page 10] Internet-Draft IMA Framework September 2005 [IMA-ops] "Placeholder: whatever we call the operations document", 2005. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels'", RFC 2119, March 1997. [RFC2821] Klensin, J., "Simple Mail Transfer Protocol", RFC 2821, April 2001. [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, "Internationalizing Domain Names in Applications (IDNA)", RFC 3490, March 2003. 12.2. Informative References [Hoffman-IMAA] Hoffman, P. and A. Costello, "Internationalizing Mail Addresses in Applications (IMAA)", draft-hoffman-imaa-03 (work in progress), October 2003. [IMA-history] Klensin, J., "Decisions and Alternatives for Internationalization of Email Addresses", Internet- Draft forthcoming, September 2005. [IMA-imap-pop] Klensin, J., "Considerations for IMAP and POP in Conjunction with Email Address Internationalization", draft-klensin-ima-imappop-00a (work in progress), October 2005. [JET-IMA] Yao, J. and J. Yeh, "Internationalized eMail Address (IMA)", draft-lee-jet-ima-00 (work in progress), June 2005. [Klensin-emailaddr] Klensin, J., "Internationalization of Email Addresses", draft-klensin-emailaddr-i18n-03 (work in progress), July 2005. [RFC1939] Myers, J. and M. Rose, "Post Office Protocol - Version 3", STD 53, RFC 1939, May 1996. [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996. [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions) Klensin & Ko Expires April 2, 2006 [Page 11] Internet-Draft IMA Framework September 2005 Part Three: Message Header Extensions for Non-ASCII Text", RFC 2047, November 1996. [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations", RFC 2231, November 1997. [RFC2449] Gellens, R., Newman, C., and L. Lundblade, "POP3 Extension Mechanism", RFC 2449, November 1998. [RFC2822] Resnick, P., "Internet Message Format", RFC 2822, April 2001. [RFC3501] Crispin, M., "INTERNET MESSAGE ACCESS PROTOCOL - VERSION 4rev1", RFC 3501, March 2003. [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource Identifiers (IRIs)", RFC 3987, January 2005. Klensin & Ko Expires April 2, 2006 [Page 12] Internet-Draft IMA Framework September 2005 Authors' Addresses John C Klensin 1770 Massachusetts Ave, #322 Cambridge, MA 02140 USA Phone: +1 617 491 5735 Email: john-ietf@jck.com YangWoo Ko MOCOCO, Inc. 996-1, 11F, Mirae Asset Venture Tower, Daechi-dong Gangnam-gu, Seoul 135-280 Korea Email: yw@mrko.pe.kr Klensin & Ko Expires April 2, 2006 [Page 13] Internet-Draft IMA Framework September 2005 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2005). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Klensin & Ko Expires April 2, 2006 [Page 14]