Revised I-D: draft-alvestrand-content-language-03

Harald Tveit Alvestrand
Fri, 15 Feb 2002 08:36:25 -0800

Content-Type: text/plain; charset=us-ascii; FORMAT=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

please publish this.

ietf-languages and Bruce Lilly: Please review.
The main change is in the ABNF of the headers; all are now specified in RFC 
2282 ABNF, and use the "obs-" formalism from RFC 2822 to specify accept and 
generate grammars.



Content-Type: text/plain; charset=iso-8859-1; name="content-language-03.txt"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="content-language-03.txt";

               Internet-Draft                                       H. =
               draft-alvestrand-content-language-03.txt             Cisco =
               Target Category: Standards Track                     =
February 2002=20
               Updates: RFC 1766                             Expires: =
August 2002=20

               Content Language Headers=20

               Status of this Memo=20
                    The file name of this memo is =
                    This document is an Internet-Draft and is in full =
conformance with=20
                    all provisions of Section 10 of RFC 2026.=20
                    Internet-Drafts are working documents of the Internet =
                    Task Force (IETF), its areas, and its working groups.  =
Note that=20
                    other groups may also distribute working documents as =
                    Internet-Drafts are draft documents valid for a maximum =
of six=20
                    months and may be updated, replaced, or obsoleted by =
                    documents at any time.  It is inappropriate to use =
                    Drafts as reference material or to cite them other than =
as "work=20
                    in progress."=20
                    The list of current Internet-Drafts can be accessed at=20
                    The list of Internet-Draft Shadow Directories can be =
accessed at=20
               Comments on this draft should be sent to the mailing list =

               This document defines a "Content-language:" header, for use =
in the case=20
               where one desires to indicate the language of something that =
has RFC-
               822-like headers, like MIME body parts or Web documents, and =
               "Accept-Language:" header for use in the case where one =
wishes to=20
               indicate one's preferences with regard to languages.=20

               1. Introduction=20
               Content Language Headers                         Harald =
               draft-alvestrand-content-language-03.txt       Expires =
August 2002=20
               There are a number of languages presently or previously used =
by human=20
               beings in this world.=20
               A great number of these people would prefer to have =
               presented in a language which they understand.=20
               In some contexts, it is possible to have information =
available in more=20
               than one language, or it might be possible to provide tools  =
(such as=20
               dictionaries) to assist in the understanding of a language.=20
               In other cases, it may be desirable to use a computer =
program to=20
               convert information from one format (such as plaintext) into =
               (such as computer-synthesized speech, or Braille, or =
high-quality print=20
               A prerequisite for any such function is a means of labelling =
               information content with an identifier for the language that =
is used in=20
               this information content, such as is defined by [TAGS].=20
               This document specifies a protocol element for use with =
protocols that=20
               use RFC-822 like headers for carrying language tags as =
defined in=20
               The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL =
               "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and =
"OPTIONAL" in this=20
               document are to be interpreted as described in [RFC 2119].=20

               2. The Content-language header=20
               The "Content-Language" header is intended for use in the =
case where one=20
               desires to indicate the language(s) of something that has =
               headers, such as MIME body parts or Web documents.=20
               The RFC-822 EBNF of the Content-Language header is:=20
                Content-Language =3D "Content-Language" ":" 1#Language-tag=20
               In the more strict RFC 2234 ABNF:=20
                Content-Language =3D "Content-Language" ":" [CFWS] =
                Language-List =3D Language-Tag [CFWS] *("," [CFWS] =
Language-Tag [CFWS])=20
               The Content-Language header may list several languages in a =
               separated list.=20
               The CFWS construct is intended to function like the =
               convention in RFC 822, which means also that one can place=20
               parenthesized comments anywhere in the language sequence, or =
               continuation lines. A formal definition is given in RFC 2822 =
               In keeping with the tradition of RFC 2822, a more liberal =
               grammar is also given:=20
               obs-content-language =3D "Content-Language" *WSP ":" [CFWS] =
               draft-alvestrand-content-language-03.txt                 =
[Page 2] =0C
               Content Language Headers                         Harald =
               draft-alvestrand-content-language-03.txt       Expires =
August 2002=20
               Like RFC 2822, this specification says that conforming =
               MUST accept the obs-content-language syntax, but MUST NOT =
generate it;=20
               all generated headers MUST conform to the Content-Language =
               2.1 Examples of Content-language values=20
               Voice recording from Liverpool downtown=20
                  Content-type: audio/basic=20
                  Content-Language: en-scouse=20
               Document in Mingo, an American Indian language which does =
not have an=20
               ISO 639 code:=20
                  Content-type: text/plain=20
                  Content-Language: i-mingo=20
               An English-French dictionary=20
                  Content-type: application/dictionary=20
                  Content-Language: en, fr (This is a dictionary)=20
               An official European Commission document (in a few of its =
                  Content-type: multipart/alternative=20
                  Content-Language: da, de, el, en, fr, it=20
               An excerpt from Star Trek=20
                  Content-type: video/mpeg=20
                  Content-Language: i-klingon=20

               3. The Accept-Language header=20
               The "Accept-Language" header is intended for use in the case =
where a=20
               user or a process desires to identify the preferred =
language(s) when=20
               RFC-822-like headers, such as MIME body parts or Web =
documents are=20
               The RFC-822 EBNF of the Accept-Language header is:=20
                Accept-Language =3D "Accept-Language" ":"=20
                                         1#( language-range [ ";" "q" "=3D" =
qvalue ] )=20
               A slightly more restrictive RFC-2234 ABNF definition is:=20
               draft-alvestrand-content-language-03.txt                 =
[Page 3] =0C
               Content Language Headers                         Harald =
               draft-alvestrand-content-language-03.txt       Expires =
August 2002=20
               Accept-Language =3D "Accept-Language:" [CFWS] language-q *( =
"," [CFWS]=20
               language-q )=20
               language-q =3D language-range [";" [CFWS] "q=3D" qvalue ] =
[CFWS]  =20
               qvalue         =3D ( "0" [ "." 0*3DIGIT ] )=20
                              / ( "1" [ "." 0*3("0") ] )=20
               A more liberal RFC-2234 ABNF definition is:=20
               Obs-accept-language =3D "Accept-Language" *WSP ":" [CFWS] =
                    *( "," [CFWS] obs-language-q ) [CFWS]=20
               obs-language-q =3D language-range [ [CFWS] ";" [CFWS] "q" =
[CFWS] "=3D"=20
               qvalue ]=20
               Like RFC 2822, this specification says that conforming =
               MUST accept the obs-accept-language syntax, but MUST NOT =
generate it;=20
               all generated messages MUST conform to the Accept-Language =
               The syntax and semantics of language-range is defined in =
               (Note that RFC-822 EBNF rather than ABNF is used here, in =
order to=20
               ensure that the syntax is identical with that specified in =
[RFC 2616]).=20
               The Accept-Language header may list several language-ranges =
in a comma-
               separated list, and each may include a quality value Q.=20
               If no Q values are given, the language-ranges are given in =
               order, with the leftmost language-range being the most =
               language; this is an extension to the HTTP/1.1 rules, but =
               current practice.=20
               If Q values are given, refer to HTTP/1.1 [RFC 2616] for the =
details on=20
               how to evaluate it.=20

               4. Security Considerations=20
               The only security issue that has been raised with language =
tags since=20
               the publication of RFC 1766, which stated that "Security =
issues are=20
               believed to be irrelevant to this memo", is a concern with =
               ranges used in content negotiation - that they may be used =
to infer the=20
               nationality of the sender, and thus identify potential =
targets for=20
               This is a special case of the general problem that anything =
you send is=20
               visible to the receiving party; it is useful to be aware =
that such=20
               concerns can exist in some cases.=20
               The exact magnitude of the threat, and any possible =
countermeasures, is=20
               left to each application protocol.=20

               5. Character set considerations=20
               This document adds no new considerations beyond what is =
mentioned in=20
               draft-alvestrand-content-language-03.txt                 =
[Page 4] =0C
               Content Language Headers                         Harald =
               draft-alvestrand-content-language-03.txt       Expires =
August 2002=20
               6. Acknowledgements=20
               This document has benefited from many rounds of review and =
comments in=20
               various fora of the IETF and the Internet working groups.=20
               Any list of contributors is bound to be incomplete; please =
regard the=20
               following as only a selection from the group of people who =
               contributed to make this document what it is today.=20
               In alphabetical order:=20
               Tim Berners-Lee, Nathaniel Borenstein, Sean M. Burke, John =
Clews, Jim=20
               Conklin, John Cowan, Dave Crocker, Martin Duerst, Michael =
Everson, Ned=20
               Freed, Tim Goodwin, Dirk-Willem van Gulik, Marion Gunn, Paul =
               Olle Jarnefors, John Klensin, Bruce Lilly, Keith Moore, =
Chris Newman,=20
               Masataka Ohta, Keld Jorn Simonsen, Rhys Weatherley, Misha =
               Francois Yergeau and many, many others.=20
               Special thanks must go to Michael Everson, who has served as =
               tag reviewer for almost the complete period since the =
publication of=20
               RFC 1766, and has provided a great deal of input to this =
               Bruce Lilly did a special job of reading and commenting on =
my ABNF=20

               7. Author's Address=20
               Harald Tveit Alvestrand=20
               Cisco Systems=20
               Weidemanns vei 27=20
               7043 Trondheim=20
               Phone: +47 73 50 33 52=20

               8. References=20
               [TAGS]    Alvestrand, H., "Tags for the identification of =
               RFC 3066=20
               [ISO 639]=20
                    ISO 639:1988 (E/F) - Code for the representation of =
names of=20
                    languages - The International Organization for =
                    1st edition, 1988-04-01 Prepared by ISO/TC 37 - =
                    (principles and coordination).=20
                    Note that a new version (ISO 639-1:2000) is in =
preparation at the=20
                    time of this writing.=20
               [ISO 639-2]=20

               draft-alvestrand-content-language-03.txt                 =
[Page 5] =0C
               Content Language Headers                         Harald =
               draft-alvestrand-content-language-03.txt       Expires =
August 2002=20
                    ISO 639-2:1998 - Codes for the representation of names =
                    languages -- Part 2: Alpha-3 code  - edition 1, =
1998-11-01, 66=20
                    pages, prepared by ISO/TC 37/SC 2=20
               [ISO 3166]=20
                    ISO 3166:1988 (E/F) - Codes for the representation of =
names of=20
                    countries - The International Organization for =
                    3rd edition, 1988-08-15.=20
               [ISO 15924]=20
                    ISO/DIS 15924 - Codes for the representation of names =
of scripts=20
                    (under development by ISO TC46/SC2) =20
               [RFC 1521]=20
                    Borenstein, N., and N. Freed, "MIME Part One: =
Mechanisms for=20
                    Specifying and Describing the Format of Internet =
Message Bodies",=20
                    RFC 1521, Bellcore, Innosoft, September 1993.=20
               [RFC 2119]=20
                    Key words for use in RFCs to Indicate Requirement =
Levels. S.=20
                    Bradner. March 1997.=20
               [RFC 2234]=20
                    Augmented BNF for Syntax Specifications: ABNF. D. =
Crocker, Ed., P.=20
               Overell, November 1997.=20
               [RFC 2616]=20
                    Hypertext Transfer Protocol -- HTTP/1.1. R. Fielding, =
J. Gettys, =20
                    J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. =
Berners-Lee. June=20
               [RFC 2822]=20
                    Internet Message Format. P. Resnick, Editor. April =

               Appendix A: Changes from RFC 1766=20

               The definition of the language tags has been split, and is =
now RFC 3066=20
               The differences parameter to multipart/alternative is no =
longer part of=20
               this standard, because no implementations of the function =
were ever=20
               found. Consult RFC 1766 if you need the information.=20
               The ABNF for content-language has been updated to use the =
RFC 2234=20

               draft-alvestrand-content-language-03.txt                 =
[Page 6] =0C