Control characters in Description or Comments fields?

Phillips, Addison addison at lab126.com
Fri Feb 27 01:18:34 CET 2015


No, that's not correct: the ABNF grammar for the registry does not permit the C0 controls except where CRLF is *explicitly* required. See section 3.1.1. Although the text is permissive, the grammar is not--on purpose.

I can't think of a reason we would ever allow a registration to occur with those (or their C1 control friends) anyway. Controls would only appear if someone (well, specifically *YOU*) were to submit a record containing same. I would never support such a registration.

I suppose it is possible for other non-graphic Unicode control characters to occur. For example, a mixed direction description might use one of the isolation or direction control characters (with &pdf;) or one might see a variation selector, ZWJ, ZWNJ, etc. in a non-Latin sequence in certain languages. 

Addison

> -----Original Message-----
> From: Ietf-languages [mailto:ietf-languages-bounces at alvestrand.no] On
> Behalf Of Doug Ewell
> Sent: Thursday, February 26, 2015 3:52 PM
> To: ietf-languages
> Subject: Control characters in Description or Comments fields?
> 
> The Description and Comments fields in the Registry are special, in that they
> contain relatively free text, as opposed to dates or subtag values or labels
> from small, constrained sets. As such, there are special guidelines in RFC 5646
> concerning their content.
> 
> Section 3.1.1 ("File Format") says:
> 
> "[...] fields are restricted to the printable characters from the US-ASCII
> [ISO646] repertoire unless otherwise indicated in the description of a specific
> field (Section 3.1.2)."
> 
> Section 3.1.5 ("Description Field") says:
> 
> "The 'Description' field MAY include the full range of Unicode characters."
> 
> Section 3.1.12 ("Comments Field") says:
> 
> "The field-body MAY include the full range of Unicode characters and is not
> restricted to any particular script."
> 
> Technically, this would include C0 control characters (U+0000 through
> U+001F), but this seems unlikely and hard to work with. Can anyone think
> of a legitimate reason why Description or Comments fields would ever
> contain such characters, including CRLF?
> 
> I ask because I'm teaching my BCP 47 library in C# to write out the Registry in
> JSON, for whatever reason, and escaping these code points seems like it
> would have a poor pain/gain balance.
> 
> --
> Doug Ewell | http://ewellic.org | Thornton, CO 🇺🇸
> 
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages


More information about the Ietf-languages mailing list