Media Type "text/csv": new draft (-02) and Last Call
Graham Klyne
GK-lists at ninebynine.org
Tue Mar 29 13:21:21 CEST 2005
At 15:19 23/03/05 +0000, clyde.ingram at edl.uk.eds.com wrote:
>Please clarify whether the trailing commas that your Excel export
>generates are there to mark the end of the last field, or to mark the
>start of a last field which currently has no value.
I'm not sure how to tell the difference in an Excel spreadsheet.
In the case where this arose for me, I had created a speadsheet with
varying numbers of values in different rows, and many of the rows were
output by Excel with *multiple* trailing commas. Some rows were generated
without any trailimng commas. My point would be that if this happens with
reasonable data then is must be permitted. Whether it's interpreted as a
field terminator as start of field with no value is, I think, moot.
#g
--
>Graham,
>
>-----Original Message-----
>From: Graham Klyne
>[<mailto:GK-lists at ninebynine.org>mailto:GK-lists at ninebynine.org]
>Sent: Wednesday, March 23, 2005 9:55 AM
>To: Yakov Shafranovich; clyde.ingram at edl.uk.eds.com
>Cc: ietf-types at alvestrand.no
>Subject: Re: Media Type "text/csv": new draft (-02) and Last Call
>
>At 01:14 23/03/05 -0500, Yakov Shafranovich wrote:
>
> >Clyde,
> >
> >Thanks for pointing this out. I personally think that instead of making
> >the header record mandatory which is something that most CSV applications
> >do not have, I would rather take the comma out of the end of the record
> >and have the last field end with a CRLF instead of an optional COMMA. Do
> >you think that is a plausible solution?
>
>No. Some of the Excel data I process has trailing commas. This must be
>allowed.
>
>I also don't think it's necessary to say anything (other than maybe as a
>comment) about any special status for the first line: such use is
>accommodated quite reasonably within the basic CSV format.
>
>For example, having such a line when exporting Excel as CSV depends
>entirely upon how the user constructs the original spreadsheet. Column
>headings are common, but not mandatory. In some cases, there may be a more
>complex heading structure -- this is an application issue, not a dataset
>format issue, and as such does not belong in the dataset format
>specification.
>
>#g
>--
>------------
>
>Please clarify whether the trailing commas that your Excel export
>generates are there to mark the end of the last field, or to mark the
>start of a last field which currently has no value.
>
>To take a concrete example, I would expect a CSV of sibling relationships
>in a mythical family to look like this, assuming the siblings are one
>brother (Bart) and 2 sisters (Lisa & Maggie):
>
> child,sisters,brothers<CR-LF>
> Bart,Lisa & Maggie,<CR-LF>
> Lisa,Maggie,Bart<CR-LF>
> Maggie,Lisa,Bart<CR-LF>
>
>where the trailing comma for the record of child=Bart signifies that the
>"brothers" field is null, so that Bart has no brothers. In my view this
>is a logical conclusion, and in fact stripping that one trailing comma
>would be an error, as that record would only have 2 fields, not 3.
>
>Would you, however, expect the CSV file to use comma as a
>field-terminator, rather than a field-separator, as follows?:
>
> child,sisters,brothers,<CR-LF>
> Bart,Lisa & Maggie,,<CR-LF>
> Lisa,Maggie,Bart,<CR-LF>
> Maggie,Lisa,Bart,<CR-LF>
>
>Note that parsers that split data records on unprotected comma would
>detect one field too many in this latter case.
>
>In a Comma SEPARATED Value file format, can you configure Excel to use
>comma as a SEPARATOR between values, rather than a TERMINATOR (at the end
>of values)?
>
>
>Regarding your remarks on the header record being "an application issue,
>not a dataset format issue, and as such does not belong in the dataset
>format specification": XML, ASN.1, and other (application-independent)
>data interchange formats, explicitly tag individual fields so that their
>type is unambiguously defined within a context. In contrast, CSV conveys
>no tags per field in a data record. Hence, to help with
>application-independent data interchange, the CSV format should convey
>field titles in a header record.
>
>Here is an example of lack of application-independence: if my application
>sends yours this CSV file:
>
> ,Bart,Lisa & Maggie<CR-LF>
> Bart,Lisa,Maggie<CR-LF>
> Bart,Maggie,Lisa<CR-LF>
>
>and your application depends on the assumption that the fields are the
>sequence:
>
> child
> sisters
> brothers
>
>then your application will mis-interpret the data.
>But if my application precedes this with a header record, like so:
>
> brothers,child,sisters<CR-LF>
> ,Bart,Lisa & Maggie<CR-LF>
> Bart,Lisa,Maggie<CR-LF>
> Bart,Maggie,Lisa<CR-LF>
>
>then your application can maintain independence from the change by my
>application, because the CSV file conveys the corresponding new field
>sequence (the columns "brother" and "child" have swapped).
>
>Regards,
>Clyde
------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact
More information about the Ietf-types
mailing list