Media Type "text/csv": new draft (-02) and Last Call

Wed Mar 30 20:44:57 CEST 2005

Graham & Yakov,

Where a Comma-Separated-Value format is used by peer computer applications
attempting to communicate with each other in an open fashion, it is very
simple for them to produce a fixed number of Comma-Separated-Values.
It sounds like your production of a variable number of
Comma-Separated-Values is an artefact of how you are manually driving one
proprietary spreadsheet program from the keyboard/mouse.  If such manually
generated output is to be read by a similarly manually driven mechanism,
then it may be acceptable to have variable numbers of Comma-Separated-Values
per record.  Standardisation in numbers of fieldds is then unnecessary, so
an RFC need not cater for the uncontrolled nature of manually handled data.

But the same cannot be said of automated computer-based applications, where
maintaining a strict count of generated and expected Comma-Separated-Values
per record is not only easy, but also allows for an extra level of data
validation: namely that a received record is corrupt if it has too few or
too many fields.  This is where standardisation in the format of the CSV
records becomes appropriate material for an RFC.

Regards,
Clyde Ingram

-----Original Message-----
From: Graham Klyne [mailto:GK-lists at ninebynine.org]
Sent: Tuesday, March 29, 2005 12:21 PM
To: clyde.ingram at edl.uk.eds.com; YakovS at solidmatrix.com
Cc: ietf-types at alvestrand.no
Subject: RE: Media Type "text/csv": new draft (-02) and Last Call

At 15:19 23/03/05 +0000, clyde.ingram at edl.uk.eds.com wrote:
>Please clarify whether the trailing commas that your Excel export 
>generates are there to mark the end of the last field, or to mark the 
>start of a last field which currently has no value.

I'm not sure how to tell the difference in an Excel spreadsheet.

In the case where this arose for me, I had created a speadsheet with 
varying numbers of values in different rows, and many of the rows were 
output by Excel with *multiple* trailing commas.  Some rows were generated 
without any trailimng commas.  My point would be that if this happens with 
reasonable data then is must be permitted.  Whether it's interpreted as a 
field terminator as start of field with no value is, I think, moot.

#g
--

>Graham,
>
>-----Original Message-----
>From: Graham Klyne 
>[<mailto:GK-lists at ninebynine.org>mailto:GK-lists at ninebynine.org]
>Sent: Wednesday, March 23, 2005 9:55 AM
>To: Yakov Shafranovich; clyde.ingram at edl.uk.eds.com
>Cc: ietf-types at alvestrand.no
>Subject: Re: Media Type "text/csv": new draft (-02) and Last Call
>
>At 01:14 23/03/05 -0500, Yakov Shafranovich wrote:
>
> >Clyde,
> >
> >Thanks for pointing this out. I personally think that instead of making
> >the header record mandatory which is something that most CSV applications
> >do not have, I would rather take the comma out of the end of the record
> >and have the last field end with a CRLF instead of an optional COMMA. Do
> >you think that is a plausible solution?
>
>No.  Some of the Excel data I process has trailing commas.  This must be
>allowed.
>
>I also don't think it's necessary to say anything (other than maybe as a
>comment) about any special status for the first line:  such use is
>accommodated quite reasonably within the basic CSV format.
>
>For example, having such a line when exporting Excel as CSV depends
>entirely upon how the user constructs the original spreadsheet.  Column
>headings are common, but not mandatory.  In some cases, there may be a more
>complex heading structure -- this is an application issue, not a dataset
>format issue, and as such does not belong in the dataset format 
>specification.
>
>#g
>--
>------------
>
>Please clarify whether the trailing commas that your Excel export 
>generates are there to mark the end of the last field, or to mark the 
>start of a last field which currently has no value.
>
>To take a concrete example, I would expect a CSV of sibling relationships 
>in a mythical family to look like this, assuming the siblings are one 
>brother (Bart) and 2 sisters (Lisa & Maggie):
>
>     child,sisters,brothers<CR-LF>
>     Bart,Lisa & Maggie,<CR-LF>
>     Lisa,Maggie,Bart<CR-LF>
>     Maggie,Lisa,Bart<CR-LF>
>
>where the trailing comma for the record of child=Bart signifies that the 
>"brothers" field is null, so that Bart has no brothers.  In my view this 
>is a logical conclusion, and in fact stripping that one trailing comma 
>would be an error, as that record would only have 2 fields, not 3.
>
>Would you, however, expect the CSV file to use comma as a 
>field-terminator, rather than a field-separator, as follows?:
>
>     child,sisters,brothers,<CR-LF>
>     Bart,Lisa & Maggie,,<CR-LF>
>     Lisa,Maggie,Bart,<CR-LF>
>     Maggie,Lisa,Bart,<CR-LF>
>
>Note that parsers that split data records on unprotected comma would 
>detect one field too many in this latter case.
>
>In a Comma SEPARATED Value file format, can you configure Excel to use 
>comma as a SEPARATOR between values, rather than a TERMINATOR (at the end 
>of values)?
>
>
>Regarding your remarks on the header record being "an application issue, 
>not a dataset format issue, and as such does not belong in the dataset 
>format specification": XML, ASN.1, and other (application-independent) 
>data interchange formats,  explicitly tag individual fields so that their 
>type is unambiguously defined within a context.  In contrast, CSV conveys 
>no tags per field in a data record.  Hence, to help with 
>application-independent data interchange, the CSV format should convey 
>field titles in a header record.
>
>Here is an example of lack of application-independence: if my application 
>sends yours this CSV file:
>
>     ,Bart,Lisa & Maggie<CR-LF>
>     Bart,Lisa,Maggie<CR-LF>
>     Bart,Maggie,Lisa<CR-LF>
>
>and your application depends on the assumption that the fields are the 
>sequence:
>
>     child
>     sisters
>     brothers
>
>then your application will mis-interpret the data.
>But if my application precedes this with a header record, like so:
>
>     brothers,child,sisters<CR-LF>
>     ,Bart,Lisa & Maggie<CR-LF>
>     Bart,Lisa,Maggie<CR-LF>
>     Bart,Maggie,Lisa<CR-LF>
>
>then your application can maintain independence from the change by my 
>application, because the CSV file conveys the corresponding new field 
>sequence (the columns "brother" and "child" have swapped).
>
>Regards,
>Clyde

------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/ietf-types/attachments/20050330/ed9b78ae/attachment.html