I made a function to output log information from DB because I want to analyze log information from the client.
As a result of outputting the log after the actual operation started, if the log is about 1 or 2 days, it will take time, but csv could be output. However, it takes too much time for the whole period to be useful. I fell into such a situation. When I actually tried it, it was not output even after 15 minutes, and it was in the state of "Surely this can not be used ...", so I tried to directly hit SQL to output the log.
Let's output DB data to CSV with Rails | Miscellaneous notes that are not already known | note
I used to output the common csv output like this using the CSV library. (Is this this?
library csv (Ruby 2.7.0 Reference Manual)
)
This time, issuing a query with raw SQL and outputting a log also uses the CSV library, but the usage is slightly different.
The SQL issued by ActiveRecord is converted to raw SQL using the to_sql
method and output.
app/services/large_csv_exporter_service.rb
class LargeCsvExporterService
require 'csv'
attr_accessor :table, :column, :host, :username, :password, :database
def initialize(table, column, host, username, password, database)
@table = table
@column = column
@host = host
@username = username
@password = password
@database = database
end
def set_file_name(file_name)
@csv_file_name = file_name
end
def set_query(query)
results = @client.query(query)
results
end
def write_csv(results)
# File.write(@csv_file_name, encoding: Encoding::SJIS) unless File.exist? @csv_file_name
CSV.open("./tmp/csv_export/#{@csv_file_name}", "w:sjis") do |csv|
csv << results.fields
results.each do |row|
csv << row.values
end
end
end
def export_csv_file
filepath = "./tmp/csv_export/#{@csv_file_name}"
stat = File::stat(filepath)
return filepath, stat, @csv_file_name
end
def delete_created_csv_file
if File.exist?("./tmp/csv_export/#{@csv_file_name}")
File.delete("./tmp/csv_export/#{@csv_file_name}")
end
end
def set_client
# cache_rows:False to avoid using memory
@client = Mysql2::Client.new(
host: host,
username: username,
password: password,
database: database,
stream: true,
cache_rows: false,
)
end
end
I'm using service in a controller. From csv_export = LargeCsvExporterService.new ("access_logs", "*", ENV ["DB_D_HOST"], ENV ["DB_D_USERNAME"], "", ENV ["DB_D_NAME"])
around the middle. ʻENV ["DB_D_HOST"] `refers to the DB name or user name of .env (dotenv).
I can't write pretty code, so please take a good look.
app/controllers/admins/access_logs_controller.rb
def export_csv(params_q)
@q = AccessLog.order(id: :asc).ransack(params_q)
user_type_param = params[:q][:user_type].empty? ? ["user", "admin", "supplier"] : params[:q][:user_type].split(':')[1]
methoda_type_param = params[:q][:method_type] == "ALL_TYPE" ? AccessLog.distinct.pluck(:method_type).compact : params[:q][:method_type]
access_dates = AccessLog.order(access_date: :asc).to_a
from_time = params[:q][:created_at_gteq].empty? ? access_dates.first.access_date : params[:q][:created_at_gteq].to_time
to_time = params[:q][:created_at_lt].empty? ? access_dates.last.access_date : params[:q][:created_at_lt].to_time
sql_query = AccessLog.where(user_type: user_type_param).where(method_type: methoda_type_param).where(created_at: [from_time..to_time]).to_sql
csv_export = LargeCsvExporterService.new("access_logs", "*", ENV["DB_D_HOST"], ENV["DB_D_USERNAME"], "", ENV["DB_D_NAME"])
csv_export.set_file_name("AccessLog_#{Time.zone.now.strftime("%Y%m%d%S")}.csv")
csv_export.set_client
# results = csv_export.set_query("SELECT * FROM access_logs WHERE created_at BETWEEN '#{from_time}' AND '#{to_time}' AND method_type = '#{methoda_type_param}' AND user_type = '#{user_type_param}'" )
results = csv_export.set_query(sql_query)
csv_export.write_csv(results)
filepath, stat, csv_file_name = csv_export.export_csv_file
send_result = send_file(filepath, :filename => csv_file_name, :length => stat.size)
# csv_export.delete_created_csv_file
end
The sql query outputs something like this in one line
SELECT `access_logs`.*
FROM `access_logs`
WHERE `access_logs`.`user_type` = 'user'
AND `access_logs`.`method_type` IN ( 'GET', 'POST' )
AND `access_logs`.`created_at` BETWEEN
'2020-08-25 00:00:00' AND '2020-09-27 23:55:35'
Well, like this, I haven't actually measured the time, but now I can output about 100,000 files in about 10 seconds.
Recommended Posts