Writing CSV files with headers in Matlab

Update 6th July 2010: Walid suggests an alternative to my code in the first comment below, which apparently is much faster. I’ve not had a chance to check it out myself, but others have and like it. I suggest you use that instead of mine.

Update 1st May 2012: Pendragon found that Walid’s code didn’t work for very large data sets due to memory constraints. However, using dlmwrite to write small chunks of a large data set still gives speed advantages over my original fprintf approach. See Comments 8-12 below for details.

The Matlab csvwrite function has the limitation that it can only write numeric data to a CSV file. This is all well and good until you want a set of column headers in the first line of your file, so other people know what’s in it. I’ve seen a few workarounds for this, mostly revolving around writing out the header and CSV data as separate files then concatenating them together, either with horrible fileread calls, or (even more ugly) calling the external UNIX cat program. Either way, you end up mucking around with multiple files and associated renaming/moving commands.

Here’s a relatively easy method of writing a CSV file with a header. First, make sure you have two variables – one containing the numeric data for your file, and the other containing the header line as a string. I’m going to call my numeric data csvData and my header header, just to be original. Then I can use the following code to write both the header and csvData directly to the output file:

outid = fopen('out.csv', 'w+');
fprintf(outid, '%s', header);
for i = 1:length(csvData)
    outLine = regexprep(num2str(csvData(i,1:37)), '  *', ',');
    fprintf(outid, '%s\n', outLine);
end
fclose(outid);

Now you’ve seen it, I’ll explain how it’s done. In the first two lines, we open a channel to a new file, and write our header as the first line. Next, we loop through each row of the data set, converting it to CSV format and adding it to a file. Finally we close the file to tidy up.

The generation of the outLine variable requires a little more examination. First off, my data set has 37 columns, so I grab them all from the ith row, and run num2str on it (There is undoubtedly a way to work out how many columns your data set has programmatically – answers in the comments please!). This gives us a single string containing all the columns, each separated by a number of spaces. Finally, we use a regular expression to convert one or more spaces into a comma – this gives us our data in the desired CSV format, ready to be added to the file.

Note: This code was developed in Matlab 6.5. I assume that the newer versions work in much the same way, but I don’t have the resources to try it.

16 Comments

  1. The code is a bit slow, the following will greatly expedite it
    header=[‘test 1, test 2, test 3’];
    outid = fopen(‘out.csv’, ‘w+’);
    fprintf(outid, ‘%s’, header);
    fclose(outid);
    dlmwrite (‘out.csv’,csvData,’roffset’,1,’-append’)

  2. Hi,
    I used Walid’s code, and it worked perfectly well. Thanks are due to both Squaregoldfish and to Walid for posting this vry useful bit of code.

  3. Walids’s code works well for small files. I work with lightning data (gigabytes of input data). In this case, to avoid “Out of Memory” problems, it is necessary to get input data with fscanf, one line at a time, and write it with fprintf, as in Squaregoldfish code.

  4. @Pendragon: You might be able to get the best of both worlds by using dmwrite to add small chunks of csvData in a loop. You could probably do 10,000 lines at a time depending on how much memory you’ve got available.

    Does that make sense?

  5. @Squaregoldfish, I am working with these lightning data now, so, I tried the two approaches, with the following results (I used a sample with 6250000 lightning events approximately, one line for each event, and dlmwrite blocks of 250000 and 500000 lines, each):
    fscanf, 13 minutes.
    dlmwrite, 10 minutes (almost no time difference with 250k or 500k lines).
    In the long run, the combined approach, with the best of both worlds, as you say it, appears to be the best answer. Thanks to Walid and Squaregoldfish for the contribution.

  6. and why not

    header=[‘test 1’ ‘test 2’ ‘test 3′];
    dlmwrite (‘out.csv’,header);
    dlmwrite (‘out.csv’,csvData,’-append’);

  7. Just one more thing to add – you asked if there is a way to determine how many columns your data set has – the answer is

    size(csvData,2)

    Changing the 2 to 1 would get you the number of rows (and so on for multidimensional data).

    length(csvData)

    is a greatly inferior way of determining the number of rows, since it will give you the number of columns if there are more columns than rows, I believe.

  8. The code in 13) would result in the first row having each letter as a separate column, i.e., ‘t’, ‘e’, ‘s’, ‘t’, ‘1’, etc. The method in (1) works absolutely fine, though.

Leave a Reply