Clerical issue (line length) in Registry
doug at ewellic.org
Sat Dec 28 22:18:19 CET 2013
I discovered this issue while writing and testing a BCP 47 class library
The record for variant subtag 'bohoric', added in June 2012, includes
the following Comments field, split into multiple lines as required by
RFC 5646, Section 3.1.1:
Comments: The subtag represents the alphabet codified by Adam Bohorič in
1584 and used from the first printed Slovene book and up to the mid-
The first line is 72 characters long, but Section 3.1.1 says each line
must be limited to 72 *UTF-8 bytes*, not *characters*. Because of the c
with caron at the end of "Bohorič", this line is actually 73 UTF-8 bytes
long, and so violates the requirement. (This was my fault when I helped
prepare the record; Tomaž Erjavec had it correct in his original
I don't remember why we imposed a byte-based limit during development of
RFC 5646 (changed from 72 characters in RFC 4646, when the Registry
contained NCRs instead of literal UTF-8 characters), but we did.
Possibly it was for compatibility with existing processors, but they
would have had to be updated anyway to handle UTF-8.
I'll propose an update to this record to fix this violation. This is the
only line in the Registry that is affected.
Doug Ewell | Thornton, CO, USA
http://ewellic.org | @DougEwell
More information about the Ietf-languages