Control characters in Description or Comments fields?

Doug Ewell doug at ewellic.org
Fri Feb 27 00:51:53 CET 2015


The Description and Comments fields in the Registry are special, in that
they contain relatively free text, as opposed to dates or subtag values
or labels from small, constrained sets. As such, there are special
guidelines in RFC 5646 concerning their content.

Section 3.1.1 ("File Format") says:

"[...] fields are restricted to the printable characters from the
US-ASCII [ISO646] repertoire unless otherwise indicated in the
description of a specific field (Section 3.1.2)."

Section 3.1.5 ("Description Field") says:

"The 'Description' field MAY include the full range of Unicode
characters."

Section 3.1.12 ("Comments Field") says:

"The field-body MAY include the full range of Unicode characters and is
not restricted to any particular script."

Technically, this would include C0 control characters (U+0000 through
U+001F), but this seems unlikely and hard to work with. Can anyone think
of a legitimate reason why Description or Comments fields would ever
contain such characters, including CRLF?

I ask because I'm teaching my BCP 47 library in C# to write out the
Registry in JSON, for whatever reason, and escaping these code points
seems like it would have a poor pain/gain balance.

--
Doug Ewell | http://ewellic.org | Thornton, CO 🇺🇸



More information about the Ietf-languages mailing list