It is a conversion method from a tab-delimited file (TSV file) in Ruby to a CSV file with BOM.
tsv_to_csv.rb
require 'csv'
#Create a CSV file and add a bomb.
File.write("meibo.csv", "\uFEFF")
# 'meibo2.txt'Read the information of the above line by line, make an array with split, and put it line by line in the CSV file.
CSV.open("meibo.csv", "a", force_quotes: true ) do |meibocsv|
File.foreach('meibo.txt') do |student|
meibocsv << student.chomp.split("\t", -1)
end
end
BOM (Byte order mark) is a few bytes of data that can be reached at the beginning of text encoded in Unicode encoding format. Excel tries to open CSV file with Shift-JIS by default, so UTF-8 will cause garbled characters. As a workaround, you can prevent garbled characters by making Excel recognize that it is written in UTF-8 by adding a BOM when outputting in UTF-8. BOM is attached as follows.
File.write("meibo.csv", "\uFEFF")
The TSV file to be read is as follows.
meibo.txt
john m 18
paul m 20
alice f 15
dabid m 17
jasmin f 17
I read meibo.txt in which tab-delimited format data is written, but I would like to take the method of writing one line when reading one line. After converting one line of tab-delimited data to an array with split, write it to the CSV file as it is.
File.foreach('meibo.txt') do |student|
(CSV file)<< student.chomp.split("\t", -1)
end
If you enclose the above code in CSV.open, you can read and write one line. Basically, when moving from a tab-delimited file to a CSV file, it is processed line by line. This is to prevent memory overrun even if the data is huge, such as 10 million rows.
CSV.open("meibo.csv", "a", force_quotes: true ) do |meibo_csv|
File.foreach('meibo.txt') do |student|
meibo_csv << student.chomp.split("\t", -1)
end
end
By the way, -1 is passed to the second argument (limit) of split. The value of this second argument is 0 by default, which removes the empty string at the end of the array. This doesn't work when the TSV has an empty field. You can solve it by passing -1.
% ruby tsv_to_csv.rb
A CSV file with data written in it is generated.
meibo.csv
"john","m","18"
"paul","m","20"
"alice","f","15"
"dabid","m","17"
"jasmin","f","17"
Correctly converted to a comma separated CSV file.
Recommended Posts