Charset name length(s)

Mon Dec 13 04:52:50 CET 2004

Hi -

Could explain the connection Bruce sees between the limit on
the length on descriptors used in writing MIB modules and
the tags used for identifying character sets?  I thought I
understood MIB compiler issuess fairly well, but I seem
to be missing something here, as I just can't see how the
MIB compiler constraints are relevant.

Randy

> From: "Bruce Lilly" <blilly at erols.com>
> To: <ietf-charsets at iana.org>; <ietf-822 at imc.org>
> Cc: <ietf at ietf.org>; <ietf-languages at alvestrand.no>
> Sent: Sunday, December 12, 2004 2:10 PM
> Subject: Charset name length(s)
>

> On Sun December 5 2004 13:36, McDonald Ira wrote:
> > Hi,
> >
> > Relative to Bruce's suggestion that the 40 character restriction
> > in names applies only to MIBs:
> >
> > (1) MIBs in both SMIv1 and SMIv2 have always supported the ASN.1
> >     standard maximum of 63 characters for identifiers
> >
> > (2) But, due to underlying linker restrictions, _many_ MIB compilers
> >     truncate identifiers at 31 characters (or arbitrarily rewrite
> > them after about 25 characters)
> >
> > (3) So 40 characters isn't a helpful restriction for MIB names.
>
> [I'm copying the ietf-822 list as the issue(s) discussed
> affect MIME and the Internet Message Format; responses
> to the charset-specific part should remain on ietf-charsets.
> I'm also copying the ietf and ietf-languages lists where
> a related discussion about language tags is taking place.]
>
> To date, I have merely pointed out that the registration
> for MIME names imposes no upper bound, but that the MIB
> requirements do indicate a limit for the cs* aliases.  I
> have not stated whether I thought that there should be an
> explicit limit in general. It is now time to speak up on
> that matter.
>
> I am prompted to do so by considerations arising from a
> proposal to replace RFC 3066, which defines language tags
> and their registration procedure.  Charset names and
> language tags are connected by way of RFC 2231, which
> amended RFC 2047's definition of "encoded-word" to include
> provision for a language tag.  An encoded-word has the
> form (my representation, not the official one; for the
> latter consult RFC 2231 and errata):
>
>   =?<charset>*<language-tag>?<encoding>?<text>?=
>
> The text part must be at least 4 octets in order to accommodate
> B encoding restrictions. Encodings are currently represented
> by a single octet, and as encodings are intended to be limited
> in number, let's assume that that will suffice indefinitely.
> That leaves a maximum of 63 octets for the total length of the
> charset name and the language-tag.  RFC 2978 (charset name
> registration) provides a procedure for review, so while the
> charset name could theoretically be infinite in length, the
> review process is expected to catch cases which would prove
> problematical for encoded-words -- in fact, so far as I can
> determine, the longest charset name suitable for use in an
> encoded-word (i.e. charsets suitable for text/plain, considering
> the preferred MIME name where specified, otherwise the primary
> name) has a length of 45 octets.
>
> RFC 2231 also provides for charset specification in extended
> parameters used with Content-Type and Content-Disposition
> fields; these are not required to be charsets suitable for
> text/plain, and the combined length of charset and language
> tag length is much greater than that in an encoded-word
> (but still finite).
>
> Under RFC 3066, there is a similar registration and review
> procedure, and while again there is the theoretical
> possibility of a very long language tag, the longest such
> registered tag has a length of 11 octets.
>
> Combined, the longest charset and longest language tag
> total 56 octets, which is less than the 63 octet limit
> imposed by encoded-word syntax.
>
> Unregistered, private-use charset and/or language-tags
> could of course be longer; that does not concern me.
> Private-use requires coordination between communicating
> parties, and it is a matter for those parties to agree
> on private-use tags that fit within the relevant limits.
>
> There is a draft proposal for a replacement of RFC 3066
> which would decouple non-private-use language tag use
> from the review/registration procedure and which would
> provide for infinite length non-private-use language
> tags.  That not only represents a problem for encoded-
> word use, but it is a problem for Internet Message
> Format header (message- and MIME-part) fields which use
> language tags, such as RFC 3282's Content-Language and
> Accept-Language.  A "New Last Call" has been issued
> for the draft proposal on the ietf-announce list:
> http://www1.ietf.org/mail-archive/web/ietf-announce/current/msg00755.html
>
> RFC 2047 gives rationale for the encoded-word limit,
> and the Message Format limit can be found in RFCs 2821
> and 2822.  Given the large deployed base of software
> implementing those core Internet protocols, I do not
> forsee an opportunity to increase the encoded-word
> length limit at this time. Consequently, the maximum
> total for registered charset and language tags remains
> at no more than 63 octets (and it is conceivable that
> future encodings might require a longer text portion).
> I suggest that charset names and aliases be limited to
> the current maximum of 45 octets, and that language-tags
> for use in encoded-words and extended parameters be
> limited to 16 octets (an increase of 45% over the
> longest registered language tag).  That leaves but 2
> octets of expansion room for encoding tags and/or
> encoding-driven restrictions on the encoded text.
>
> Ideally, a lower limit for MIME charset names would
> be used; aside from a couple of pathological cases, most
> MIME-compatible charsets names registered are 17 octets
> or less in length; many have shorter aliases.  However,
> establishing a limit lower than the longest currently-
> registered name would require extraordinary action. It
> might be possible to assign MIME-preferred-name aliases
> to the excessively-long registered charset names, for
> example.  However, the overall maximum (regardless of
> whether the charset is compatible with MIME text/plain)
> should probably be held at 45 octets.  As for the MIB-
> specific aliases, I'll leave specific recommendations up
> to others, but 45 octets is certainly capable of
> accommodating the current MIB-specific limit of 40 octets.
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages