Media Type "text/csv": new draft (-02) and Last Call

Graham Klyne GK-lists at ninebynine.org
Tue Mar 29 13:21:21 CEST 2005


At 15:19 23/03/05 +0000, clyde.ingram at edl.uk.eds.com wrote:
>Please clarify whether the trailing commas that your Excel export 
>generates are there to mark the end of the last field, or to mark the 
>start of a last field which currently has no value.

I'm not sure how to tell the difference in an Excel spreadsheet.

In the case where this arose for me, I had created a speadsheet with 
varying numbers of values in different rows, and many of the rows were 
output by Excel with *multiple* trailing commas.  Some rows were generated 
without any trailimng commas.  My point would be that if this happens with 
reasonable data then is must be permitted.  Whether it's interpreted as a 
field terminator as start of field with no value is, I think, moot.

#g
--


>Graham,
>
>-----Original Message-----
>From: Graham Klyne 
>[<mailto:GK-lists at ninebynine.org>mailto:GK-lists at ninebynine.org]
>Sent: Wednesday, March 23, 2005 9:55 AM
>To: Yakov Shafranovich; clyde.ingram at edl.uk.eds.com
>Cc: ietf-types at alvestrand.no
>Subject: Re: Media Type "text/csv": new draft (-02) and Last Call
>
>At 01:14 23/03/05 -0500, Yakov Shafranovich wrote:
>
> >Clyde,
> >
> >Thanks for pointing this out. I personally think that instead of making
> >the header record mandatory which is something that most CSV applications
> >do not have, I would rather take the comma out of the end of the record
> >and have the last field end with a CRLF instead of an optional COMMA. Do
> >you think that is a plausible solution?
>
>No.  Some of the Excel data I process has trailing commas.  This must be
>allowed.
>
>I also don't think it's necessary to say anything (other than maybe as a
>comment) about any special status for the first line:  such use is
>accommodated quite reasonably within the basic CSV format.
>
>For example, having such a line when exporting Excel as CSV depends
>entirely upon how the user constructs the original spreadsheet.  Column
>headings are common, but not mandatory.  In some cases, there may be a more
>complex heading structure -- this is an application issue, not a dataset
>format issue, and as such does not belong in the dataset format 
>specification.
>
>#g
>--
>------------
>
>Please clarify whether the trailing commas that your Excel export 
>generates are there to mark the end of the last field, or to mark the 
>start of a last field which currently has no value.
>
>To take a concrete example, I would expect a CSV of sibling relationships 
>in a mythical family to look like this, assuming the siblings are one 
>brother (Bart) and 2 sisters (Lisa & Maggie):
>
>     child,sisters,brothers<CR-LF>
>     Bart,Lisa & Maggie,<CR-LF>
>     Lisa,Maggie,Bart<CR-LF>
>     Maggie,Lisa,Bart<CR-LF>
>
>where the trailing comma for the record of child=Bart signifies that the 
>"brothers" field is null, so that Bart has no brothers.  In my view this 
>is a logical conclusion, and in fact stripping that one trailing comma 
>would be an error, as that record would only have 2 fields, not 3.
>
>Would you, however, expect the CSV file to use comma as a 
>field-terminator, rather than a field-separator, as follows?:
>
>     child,sisters,brothers,<CR-LF>
>     Bart,Lisa & Maggie,,<CR-LF>
>     Lisa,Maggie,Bart,<CR-LF>
>     Maggie,Lisa,Bart,<CR-LF>
>
>Note that parsers that split data records on unprotected comma would 
>detect one field too many in this latter case.
>
>In a Comma SEPARATED Value file format, can you configure Excel to use 
>comma as a SEPARATOR between values, rather than a TERMINATOR (at the end 
>of values)?
>
>
>Regarding your remarks on the header record being "an application issue, 
>not a dataset format issue, and as such does not belong in the dataset 
>format specification": XML, ASN.1, and other (application-independent) 
>data interchange formats,  explicitly tag individual fields so that their 
>type is unambiguously defined within a context.  In contrast, CSV conveys 
>no tags per field in a data record.  Hence, to help with 
>application-independent data interchange, the CSV format should convey 
>field titles in a header record.
>
>Here is an example of lack of application-independence: if my application 
>sends yours this CSV file:
>
>     ,Bart,Lisa & Maggie<CR-LF>
>     Bart,Lisa,Maggie<CR-LF>
>     Bart,Maggie,Lisa<CR-LF>
>
>and your application depends on the assumption that the fields are the 
>sequence:
>
>     child
>     sisters
>     brothers
>
>then your application will mis-interpret the data.
>But if my application precedes this with a header record, like so:
>
>     brothers,child,sisters<CR-LF>
>     ,Bart,Lisa & Maggie<CR-LF>
>     Bart,Lisa,Maggie<CR-LF>
>     Bart,Maggie,Lisa<CR-LF>
>
>then your application can maintain independence from the change by my 
>application, because the CSV file conveys the corresponding new field 
>sequence (the columns "brother" and "child" have swapped).
>
>Regards,
>Clyde

------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact




More information about the Ietf-types mailing list