[Errata Held for Document Update] RFC5890 (4823)

Sat Oct 8 22:08:57 CEST 2016

I find that I'm with Martin on this one in preferring a narrow scope for 
errata, and in keeping the errata process separate from pending updates.

An erratum should present a more-or-less minimal fix to correct a 
misstatement in the original specification, not revise what it attempts 
to state.

The misstatement in the original text is a dual one. It conflates 
Unicode characters with octets and gives an incorrect number.

The word "longer" clearly applies only to the storage requirement, not 
to the number of code points. My beef with the proposed "corrected text" 
would be that it gets that backwards. In addition, it suffers from the 
fact that 59 code points do not necessarily correspond to 236 octets.

Tweaked Corrected Text
--------------
Because A-labels (the form actually used in the
DNS) are potentially much more compressed than UTF-8 (and UTF-8 is,
in general, more compressed that UTF-16 or UTF-32), U-labels that
obey all of the relevant symmetry (and other) constraints of these
documents may be quite a bit longer, potentially up to 236 octets, corresponding to 59 characters (Unicode code points).

Tweaking the last sentence, would make the correction be a more narrow fix.

a) it corrects the number
b) it retains the focus on storage size
c) it retains the mention of Unicode characters

By turning around the numbers, the text fixes the issue that 59 code 
points do not necessarily correspond to 236 octets. However, a U-Label 
of 236 octets ALWAYS corresponds to 59 code points, no matter which 
encoding form you choose.

A./

On 10/8/2016 3:23 AM, Martin J. Dürst wrote:
> Hello Markus, John,
>
> I would be fine either way, but I'd at least keep the current wording 
> for the errata for the following (partially overlapping) reasons:
>
> - The main point of an erratum (in my view at least) is to fix a clear
>   problem, not to engage in detailed wordsmithing.
> - The errata review process isn't at the same level as a WG process,
>   so trying to find a final wording on the erratum sounds slightly
>   premature.
> - I don't think "held for document update" does in any way imply
>   that the update has to use the exact wording.
> - The current RFCs have counts like 236. The explanations that were
>   just approved as errata help see where this number came from.
>   This is (somewhat) more important in an erratum (which is approved
>   just on the level of AD) than in a new document (which is approved
>   by IETF consensus).
> - The error was not just a calculation error; I think there was indeed
>   an intent in the WG to warn implementers about the expansion problem).
>
> Also, Markus said: "Anyone dealing with Unicode strings has an idea 
> how they store them.". I'd say: "We'd better hope so!". But 
> implementers in the DNS area are not necessarily familiar with Unicode 
> strings. So a hint can help, and shouldn't hurt.
>
> Regards,    Martin.
>
> On 2016/10/08 06:18, John C Klensin wrote:
>> For whatever it is worth, I agree with Markus.  This whole
>> discussion indicates to me that I/we said rather too much in
>> 5890 and that the correct solution is to talk about code points
>> and then stop.   In that context, the proposed "corrected text"
>> below could be further improved by saying what we mean, which
>> is, more or less,
>>
>>     "DNS labels are limited to a maximum length of 63 octets
>>     [RFC 1034] which, if only traditional ASCII characters
>>     are involved, becomes a 63 character limit.  The
>>     symmetric relationship between U-labels and A-labels and
>>     properties of the Punyocde encoding used in the latter
>>     effectively impose a smaller limit of no more than 59
>>     Unicode code points".
>>
>> That should be all we need to say and, incidentally, we should
>> only need to say it once.
>>
>> This is useful to resolve now because there is a document being
>> developed that is likely  to be posted as an I-D fairly soon.
>> That document, for other reasons, will update 5890.  So, unlike
>> the usual situation in which "Hold for Document Update" implies
>> a really long time, drafts for that update could be only weeks
>> in the future.  Resolving what the community wants now could
>> save some time and repeating the pain this issue and erratum has
>> caused already.
>>
>>     john
>>
>>
>>
>>
>> --On Friday, October 07, 2016 13:16 -0700 Markus Scherer
>> <markus.icu at gmail.com> wrote:
>>
>>> On Fri, Oct 7, 2016 at 12:58 PM, RFC Errata System <
>>> rfc-editor at rfc-editor.org> wrote:
>>>
>>>> Corrected Text
>>>> --------------
>>>> expansion of the A-label form to a U-label may produce
>>>> strings that are much longer than the normal 63 octet DNS
>>>> limit (potentially up to 59 Unicode code points or 236 octets)
>>>>
>>>
>>> The "or 236 octets" is rather confusing, and irrelevant:
>>> Anyone dealing with Unicode strings has an idea how they store
>>> them. It would be better to drop this part.
>>>
>>> If there is a desire to keep something like it, I suggest
>>> "potentially up to 59 Unicode code points which in turn could
>>> be represented by up to 236 octets in any of the standard UTF
>>> encoding forms"
>>>
>>> That would be pretty clear, but my preference is simply
>>> "potentially up to 59 Unicode code points" and be done with it.
>>>
>>> Best regards,
>>> markus
>>
>>
>>
>>
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>
>