Registration of MIME media type application/vnd.php.serialized
Mario Salzer
mario at erphesfurt.de
Tue Aug 24 06:32:27 CEST 2004
This vendor tree MIME type registration application discusses the data
format emitted by PHPs "serialize()". It is no longer in use only within
there; some independent implementations exist (evtl. becoming an exchange
format).
Now that doesn't require a MIME Type per se, but people also started
pushing data in this format through the wires ("me too!"). See also
"PHP-RPC" in [http://geri.cc.fer.hr/~ivoras/web2/papers/phprpc.pdf]
-- not the best example, but at least highlights the necessarity of
a MIME Type - else people go by text/html.
* I guess by RFC2048 the PHP group [//php.net] should have filed this
rather, but if they had any interest in it they'd already done long
before. The vendor name was more accurately "zend" or "phpnet", but
simply "application/vnd.php.serialized" looks much better.
* Not all too sure if this MIME subtype should have the trailing 'd'
or simply be called "/vnd.php.serialize". The "/...d" version however
seems to match better with "/x-www-form-urlencoded".
* I've included an as-short-as-possible description, cause Google didn't
found anything worth getting referenced, and I'm really not aware of
an official format spec.
* Btw, "Person & email address to contact for further information" could
be a mailing list instead, couldn't it?
hope the wording is 'good enough' for a vnd. subtype :)
mario
---------------------------------------------------------------------
MIME media type name: application
MIME subtype name: vnd.php.serialized
Required parameters: NONE
Optional parameters:
version=x.y.z
SHOULD be added in responses (not in Accept: headers [RFC2295])
to indicate which PHP interpreter version the data was
compacted by. This is useful to identify and eventually work
around bugs (format violations) from earlier implementations.
Encoding considerations: BINARY
As per current use, contents assigned this MIME Type should be
considered to be in the ISO-8859-1 (Latin-1) character set, so
that they also may contain arbitrary non-printable characters,
binary octets.
Transport channels not capable of handling raw 8 bit without data
corruption SHOULD therefore apply a Transfer-Encoding of "base64"
or something similar.
Security considerations:
Implementations SHOULDN'T attempt to reproduce more complex data
types like objects, file descriptors, function bytecode and
pointers, because most of this was typically of little use if the
received data came from a remote machine or even from a different
programming language.
For optimal security only the basic data types are to be extracted
and messages fully be rejected as soon as unknown data type
identifiers were detected.
Strongly typed languages with fixed-length variable data storage
should additionally take care not to expand entities of the format
described herein beyond memory boundaries; even if none of the
mentioned types is that difficult to decipher.
Interoperability considerations:
The specified format is meant to transport data in a machine-
independent representation, and is especially freed from byte-
ordering issues, because all integer types pass in string
representation. However, there is no surety that the described
data representation could be used by every possible system,
especially since it originally was made for use by interpreted
languages only.
Published specification:
To date there is no openly published specification of this data
representation format. For reference the original implementation
in [http://cvs.php.net/php-src/ext/standard/var.c#click-display]
or even better the Perl implementation by Scott Hurring available
under [http://hurring.com/code/perl/serialize/] can be reviewed.
A terse documentation of how the basic data types are to be
encoded is however included here.
Format description:
Data is compacted into string representation by bailing out
chunks containing a type identifier (one letter) an optional
length/count field (for strings and arrays) and an optional
value (optional because the NULL/undef type does not require
one for obvious reasons), all separated by colons and ended
with a semicolon in most cases.
<app/vnd.php.serialized#TEXT> = (<data>) *1
<data> = ( <undef> | <boolean> | <int> | <float>
| <string> | <hash> | <strangethings> )
<undef> = 'N;'
<boolean> = ( 'b:0;' | 'b:1;' )
<int> = 'i:' ['-'] <digit>*1 ';'
<float> = 'd:' ['-'] <digit>*1 '.' <digit>*1 ';'
<string> = 's:' ( <LENGTH> ) <:"> <text> <";>
<text> is the string this entity represents after
serialization. It has exactly <LENGTH> characters and is
enclosed here by two quote marks. No escaping is to be
applied, and the receiver shouldn't have to decode it.
<hash> = 'a:' ( <COUNT> ) ':{' (<data> <data>)*COUNT '}'
Where <COUNT> is a positive integer (as string) matching
the number of key<data>+value<data> pairs in between the
curly braces.
As an exception the string representation of an array or
hash does not carry a final semicolon (unlike all basic
data types).
Other known data type identifiers are "O" for objects, "R" for
references and "r" for object references. They are not further
mentioned here, because they are of little use for exchanging
data across servers as outlined in the Security considerations
section.
Applications which use this media type:
[http://www.php.net/]
PHP interpreter with original implementation
[http://hurring.com/code/perl/serialize/]
Perl 'serialize.pm' by Scott Hurring and others
[http://php.net/serialize#41948]
serialize methods for Ecma/JavaScript types by Iván Montes
Additional information:
Magic numbers: NONE
File extensions: NONE
Macintosh File Type Codes: NONE
Object Identifiers or OIDs: NONE
Person & email address to contact for further information:
Mario Salzer
<mario at erphesfurt.de>
Intended usage: COMMON
This data format is in heavy use for temporarily storing data to
disk to keep it beyond a single application run.
Recently it also started to get used for transportation of data
via HTTP, as lightweight alternative to SOAP and also XML-RPC.
By including the original representation types of transcoded data,
it can be used advantageous to "multipart/form-data" and of course
the more simple "application/x-www-form-urlencoded". It's not only
much simpler and faster due to less requirements on escaping, but
also indirectly allows for validation (data types).
Author/Change controller:
Mario Salzer
<mario at erphesfurt.de>
Since this registration happened on deputy of the vendor,
it's only senseful to transfer 'change control' to
<.*@php.net> on request.
More information about the Ietf-types
mailing list