fredag den 4. september 2015

How to write data from Sitecore (an other sources) to a CSV file that Excel understands

From time to time, our customers wants to export data from Sitecore to CSV, to be able to view it in Excel.

Depending on where in the world you live, this might not be an issue for you, since people in the english speaking countries will just be using ASCII and there will be no issues.

But, if you like me live in one of the nordic countries (or alot of other places), you will be running into having to deal with text with letters, such as 'æ', 'ø', 'å' among others.

If you write strings containing any of these charaters into the CSV file just like that, it will come out malformed when the file is opened in Excel.

This is due to the encoding of the file, which confuses it.

Here is how to do it the easy way:

public static class CsvWriter
{
    public static void WriteToCSVFile(string[][] dataRows, string fileName)
    {
        using (FileStream stream = new FileStream(fileName, FileMode.Create, FileAccess.Write, FileShare.None))
        {
            using (BinaryWriter writer = new BinaryWriter(stream))
            {
                WriteByteOrderMarker(writer);

                foreach (string[] dataRow in dataRows)
                {
                    WriteLineSegments(writer, dataRow);
                }
            }
        }
    }

    private static void WriteByteOrderMarker(BinaryWriter writer)
    {
        byte[] BOM = { 0xef, 0xbb, 0xbf };

        writer.Write(BOM);
        writer.Flush();
    }
            
    private static void WriteLineSegments(BinaryWriter writer, string[] lineSegments)
    {
        writer.Write(Encoding.UTF8.GetBytes(string.Join(";", lineSegments) + "\r\n"));
        writer.Flush();
    }
}

First save the data in the 2D string array, and call the WriteToCSVFile function with the filename to save the data to.

First a byte order mark is written, which tells Excel (and other applications), that the content of the file is UTF-8 encoded.
Then, when each line is written, it gets encoded, so it matches.

That way, all programs should be able to read it, as long as they understand the BOM in the start of the file :-)

1 kommentar: