0

When I write a CSV file using Ruby containing the £ sign and I open it using Excel I see this symbol instead ¬£.

My understanding is that Ruby uses UTF-8, but Excel interprets this file using a different encoding (ASCII).

I tried to write a US-ASCII encoded CSV file and guessed the £ encoding in ASCII like this:

csv = CSV.open(filename, 'w:US-ASCII')
csv << "\xA3"
csv.close

but it fails with invalid byte sequence in UTF-8 somewhere deep into the CSV library.

What am I doing wrong?
Thank you

mottalrd
  • 4,390
  • 5
  • 25
  • 31
  • 1
    `£` is not `US-ASCII` as you see in the link you send. Check https://stackoverflow.com/questions/7047944/ruby-read-csv-file-as-utf-8-and-or-convert-ascii-8bit-encoding-to-utf-8 – Giacomo Catenazzi Dec 20 '19 at 12:50
  • Thank you Giacomo. But if that's the case what symbol and encoding is Excel using when reading it back? – mottalrd Dec 20 '19 at 13:11
  • 1
    The point I wanted to make: use `ISO8859-1` as encoding. Reading and writing has much symmetry. From one you learn about the other. – Giacomo Catenazzi Dec 20 '19 at 13:25
  • Thank you Giacomo, I'll give it a try! – mottalrd Dec 20 '19 at 14:40
  • IIRC, Excel prefixes UTF-8 CSV files with a [BOM](https://en.wikipedia.org/wiki/Byte_order_mark). You might give it a try and see if it makes importing easier. – Stefan Dec 20 '19 at 14:53
  • I wrote in ISO as Giacomo suggested, and it worked! – mottalrd Dec 24 '19 at 07:06

1 Answers1

2

For sure, Excel is not bound to use ASCII. For instance, I can easily input japanese characters into an Excel cell, and these are certainly not representable by ASCII.

While Ruby, by default, uses Unicode in its internal representation, every String object incorporates its own encoding, so you could in theory mix strings with different encodings, if you want to. In your case, you want to force a certain encoding when writing a file. This can be done either by using the w: output option, as you did, or by using external_encoding: Encoding::US-ASCII. See here for the names of the constants in Encoding.

I don't think US-ASCII is a good choice for the encoding, simply because there is no pound symbol in the ASCII chart. I would have expected that you get a warning message on stderr, when trying to write a pound symbol. If you need an 8-bit-encoding, ISO-8859-1 should do the job, but my recommendation would be to write UTF-8 and tell Excel to use this encoding when reading the CSV file. The possibility to import UTF exists at least since Excel 2007.

user1934428
  • 19,864
  • 7
  • 42
  • 87
  • Thank you. Is there any way to inform Excel of which encoding to use without requiring the user intervention? – mottalrd Dec 20 '19 at 13:15
  • @mottalrd : How is the user using Excel to read the file? If he starts Excel and then imports the CSV, this **is** already user intervention. Or do you want to strart Excel from the command line, with the file actually being opened automatically? – user1934428 Dec 23 '19 at 07:09
  • Thank you for checking. The user is just double clicking on the Excel in my case, they are not explicitly importing it. I solved by writing the file in ISO format for now. Thanks! – mottalrd Dec 24 '19 at 07:06
  • You mean: Double-click on an Icon representing a Link to the Excel application? How then can he select, which CSV-file to read? – user1934428 Dec 25 '19 at 10:37