Clerical issue (line length) in Registry

Doug Ewell doug at ewellic.org
Sat Dec 28 22:18:19 CET 2013


I discovered this issue while writing and testing a BCP 47 class library 
in C#.

The record for variant subtag 'bohoric', added in June 2012, includes 
the following Comments field, split into multiple lines as required by 
RFC 5646, Section 3.1.1:

Comments: The subtag represents the alphabet codified by Adam Bohorič in
  1584 and used from the first printed Slovene book and up to the mid-
  19th century.

The first line is 72 characters long, but Section 3.1.1 says each line 
must be limited to 72 *UTF-8 bytes*, not *characters*. Because of the c 
with caron at the end of "Bohorič", this line is actually 73 UTF-8 bytes 
long, and so violates the requirement. (This was my fault when I helped 
prepare the record; Tomaž Erjavec had it correct in his original 
request.)

I don't remember why we imposed a byte-based limit during development of 
RFC 5646 (changed from 72 characters in RFC 4646, when the Registry 
contained NCRs instead of literal UTF-8 characters), but we did. 
Possibly it was for compatibility with existing processors, but they 
would have had to be updated anyway to handle UTF-8.

I'll propose an update to this record to fix this violation. This is the 
only line in the Registry that is affected.

--
Doug Ewell | Thornton, CO, USA
http://ewellic.org | @DougEwell ­




More information about the Ietf-languages mailing list