Media type for output of POSIX "diff" utility

Simon Josefsson simon at josefsson.org
Mon Jun 4 11:39:15 CEST 2007


Julian Reschke <julian.reschke at gmx.de> writes:

> Simon Josefsson wrote:
>> Bjoern Hoehrmann <derhoermi at gmx.net> writes:
>>
>>>> What exactly does that mean? That different parts of it have a
>>>> different character encoding?
>>> Yes, that's not uncommon with this kind of format. Think "I converted
>>> the README from ISO-8859-1 to UTF-8, see the attached patch".
>>
>> Even though I haven't seen that example in practice, POSIX says the
>> 'input files may be of any type' and thus I can't see any reason why the
>> above shouldn't work.
>>
>> This argues for application/patch, which would be unfortunate since most
>> patches are readable as text.
>>
>> Would it be possible to register text/patch AND application/patch, and
>> specify that if a particular patch contains text whose charset is
>> non-ASCII or not known, application/patch MUST be used, but otherwise
>> text/plain SHOULD be used?  That would not destroy data and also lead to
>> a readable output.
>
> We certainly could register both; I'm just not entirely sure what we
> would want to specify.
>
> I guess the underlying question is whether a patch is applied to a
> sequence of characters, or to a sequence of bytes?

According to
<http://www.opengroup.org/onlinepubs/000095399/utilities/diff.html> the
diff format appears to support both binary files and text files.
Someone more familiar with POSIX may be able to provide better answers
to that question.

Perhaps we could forward some questions drawn from this discussion to
the Austin Group, to get better answers.

> In the former case, I could apply a patch encoded in ISO-8859-1 to a
> text file that uses UTF-8, and the result would still be in UTF-8. But
> that's not what "patch" does in practice, right?

Whoa, hold on, I think that whatever 'patch' does is irrelevant.  I
believe the media type here should be based on the definition of the
'diff' tool and format.  From what I can tell from the POSIX definition,
the 'diff' tool would classify this situation is a comparison between
"unspecified formats":

  LC_CTYPE
    Determine the locale for the interpretation of sequences of bytes of
    text data as characters (for example, single-byte as opposed to
    multi-byte characters in arguments and input files).
 ...
 In the POSIX locale, if one or both of the files being compared are not
 text files, an unspecified format shall be used that contains the
 pathnames of two files being compared and the string "differ".

So perhaps text/patch would still work out fine.  It doesn't seem like
the POSIX definition of the 'diff' tool will generate diff-data for a
diff between two text files with different encodings.  This is also
something that could be discussed with the Austin Group mailing list,
though, I guess.

/Simon


More information about the Ietf-types mailing list