Media Type "text/csv": new draft (-02) and Last Call

clyde.ingram at edl.uk.eds.com clyde.ingram at edl.uk.eds.com
Wed Mar 23 16:19:52 CET 2005


Graham,

-----Original Message-----
From: Graham Klyne [mailto:GK-lists at ninebynine.org]
Sent: Wednesday, March 23, 2005 9:55 AM
To: Yakov Shafranovich; clyde.ingram at edl.uk.eds.com
Cc: ietf-types at alvestrand.no
Subject: Re: Media Type "text/csv": new draft (-02) and Last Call


At 01:14 23/03/05 -0500, Yakov Shafranovich wrote:

>Clyde,
>
>Thanks for pointing this out. I personally think that instead of making 
>the header record mandatory which is something that most CSV applications 
>do not have, I would rather take the comma out of the end of the record 
>and have the last field end with a CRLF instead of an optional COMMA. Do 
>you think that is a plausible solution?

No.  Some of the Excel data I process has trailing commas.  This must be 
allowed.

I also don't think it's necessary to say anything (other than maybe as a 
comment) about any special status for the first line:  such use is 
accommodated quite reasonably within the basic CSV format.

For example, having such a line when exporting Excel as CSV depends 
entirely upon how the user constructs the original spreadsheet.  Column 
headings are common, but not mandatory.  In some cases, there may be a more 
complex heading structure -- this is an application issue, not a dataset 
format issue, and as such does not belong in the dataset format
specification.

#g
--
------------

Please clarify whether the trailing commas that your Excel export generates
are there to mark the end of the last field, or to mark the start of a last
field which currently has no value.

To take a concrete example, I would expect a CSV of sibling relationships in
a mythical family to look like this, assuming the siblings are one brother
(Bart) and 2 sisters (Lisa & Maggie):

    child,sisters,brothers<CR-LF>
    Bart,Lisa & Maggie,<CR-LF>
    Lisa,Maggie,Bart<CR-LF>
    Maggie,Lisa,Bart<CR-LF>

where the trailing comma for the record of child=Bart signifies that the
"brothers" field is null, so that Bart has no brothers.  In my view this is
a logical conclusion, and in fact stripping that one trailing comma would be
an error, as that record would only have 2 fields, not 3.

Would you, however, expect the CSV file to use comma as a field-terminator,
rather than a field-separator, as follows?:

    child,sisters,brothers,<CR-LF>
    Bart,Lisa & Maggie,,<CR-LF>
    Lisa,Maggie,Bart,<CR-LF>
    Maggie,Lisa,Bart,<CR-LF>

Note that parsers that split data records on unprotected comma would detect
one field too many in this latter case.  

In a Comma SEPARATED Value file format, can you configure Excel to use comma
as a SEPARATOR between values, rather than a TERMINATOR (at the end of
values)?
 
Regarding your remarks on the header record being "an application issue, not
a dataset format issue, and as such does not belong in the dataset format
specification": XML, ASN.1, and other (application-independent) data
interchange formats,  explicitly tag individual fields so that their type is
unambiguously defined within a context.  In contrast, CSV conveys no tags
per field in a data record.  Hence, to help with application-independent
data interchange, the CSV format should convey field titles in a header
record.

Here is an example of lack of application-independence: if my application
sends yours this CSV file:

    ,Bart,Lisa & Maggie<CR-LF>
    Bart,Lisa,Maggie<CR-LF>
    Bart,Maggie,Lisa<CR-LF>
  
and your application depends on the assumption that the fields are the
sequence:

    child
    sisters
    brothers

then your application will mis-interpret the data.
But if my application precedes this with a header record, like so:

    brothers,child,sisters<CR-LF>
    ,Bart,Lisa & Maggie<CR-LF>
    Bart,Lisa,Maggie<CR-LF>
    Bart,Maggie,Lisa<CR-LF>

then your application can maintain independence from the change by my
application, because the CSV file conveys the corresponding new field
sequence (the columns "brother" and "child" have swapped).


Regards,
Clyde
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/ietf-types/attachments/20050323/b1731075/attachment.html


More information about the Ietf-types mailing list