How to delete large amounts of data in Rails and concerns

Introduction

Hello everyone! I'm @hiroki_tanaka, the producer of Mayu Sakuma.

I'm involved in maintaining Rails applications, and the other day I discovered that there was a lot of unwanted data in my production environment. That's why I considered how to delete a large amount of data in Rails, so I summarized what I investigated.

Difference between destroy and delete in Rails

First, Rails has two data deletion methods, destroy and delete. I would like to briefly summarize the differences between them.

destroy/destroy! Deletes one record specified via ActiveRecord. Callback methods (such as before_destroy and ʻafter_destroy) and validation work through ActiveRecord. Also, if there is a related Model for which dependent:: destroy` is set for the Model to be deleted, the set Model is also deleted. destroy will only return false and will not return an exception if an error occurs during execution and it cannot be deleted. In contrast, destroy! Returns an exception. Therefore, if you want to explicitly catch the error when deleting, it is better to use destroy !.

delete Issue SQL (DELETE statement) directly to DB without going through ActiveRecord to delete the target record. The callback method and validation don't work because it doesn't go through ActiveRecord. Also, even if there is a Model associated with dependent:: destroy in the Model to be deleted, it will not be deleted. The behavior at the time of failure is the same as destroy, it only returns false and does not return an exception. Since delete! Does not exist in delete, I think it is better to use destroy! Obediently when "I want to delete data and return an error if it fails".

destory_all Only one record can be deleted with destroy / destroy !, but history_all can specify multiple records and deletes all the specified records. Like destroy, history_all also goes through ActiveRecord, so the callback method and validation · dependent:: destroy work. Like destroy, destory_all causes an error at runtime, and if the deletion process fails in the middle, it only returns false and does not return an exception. However, there is no method called destory_all !, so if you want to delete a large amount of data, but if it fails, you need to return an error properly. (The method will be described later.)

delete_all Deletes the specified multiple records without going through ActiveRecord. Like delete, the callback method and validation · dependent:: destroy do not work. Like delete, delete_all also causes an error at runtime, and if the delete process fails in the middle, it only returns false and does not return an exception. Personally, I don't think there are many situations where it is used, but the processing without ActiveRecord is faster than destroy and story_all. Therefore, I think that it can be used in situations where you want to delete a large amount of data at once without worrying about exceptions, callbacks, and related items.

How to delete large amounts of data

I would like to see how to delete a large amount of data in the main subject.

Premise

Models to be deleted this time include callbacks and Models associated with dependent:: destroy. As a requirement, the related Model also needs to be deleted, and I want to explicitly catch if there is an error during processing. Delete / delete_all / destroy_all is not available at this point. Then, you need to perform the deletion process behind the scenes where the production application is running. So, I don't want to take a method that puts an extreme load on the DB.

Method (1): Do destroy! One by one without making a transaction

  animals = Animal.where(type: 'dog') #Extraction of data to be deleted
  animals.each do |animal|
    animal.destroy!
  end

The simplest way to think about it is to have code like this. However, there are two problems.

――The load is heavy because destroy continues to run 100,000 times. --If an error occurs during the deletion process and the process fails, the data deleted up to that point will remain deleted without rolling back. (The redo does not work.)

Therefore, if there is spare capacity in the DB, if the application has a clear closing time, and if there is no problem even if the deletion does not roll back if it fails in the middle, this method is fine if data integrity is not a problem. I think.

Method (2): After making a transaction, perform destroy! For each transaction.

  animals = Animal.where(type: 'dog') #Extraction of data to be deleted
  ActiveRecord::Base.transaction do
    animals.each do |animal|
      animal.destroy!
    end
  end

The deletion process of method (1) is one transaction. If an error occurs in the middle of the deletion process by making it one transaction, all the deletion processes will be rolled back and redo will be effective. Therefore, if you use this method, you can safely delete all the data that requires data integrity.

One caveat is that you must always use destroy! When you explicitly create a transaction. This is because destroy does not raise an exception and just returns false even if an error occurs, so the process does not stop and the transaction cannot be exited.

Method ③: After making a transaction, extract data every 1000 cases and perform destroy !. Insert a 0.1 second sleep between deletion processes.

  animals = Animal.where(type: 'dog') #Extraction of data to be deleted
  ActiveRecord::Base.transaction do
    animals.in_batches.each do |delete_target_animals|
      delete_target_animals.map(&:destroy!)
      sleep(0.1)
    end
  end

The above method is the method adopted this time. Use ActiveRecord :: Relation # in_batches method to combine 100,000 records in 1000 units. And destroy! For each of the chunks. Then, when the deletion process for every 1000 items is completed, the process is stopped for 0.1 seconds with sleep (0.1) to reduce the load on the DB. Also, since a large transaction is placed on the outside, even if an error occurs during the deletion process, everything will be rolled back and redoing will be effective, so it is safe.

  animals = Animal.where(type: 'dog') #Extraction of data to be deleted
  ActiveRecord::Base.transaction do
    animals.in_batches(of: 10000).each do |delete_target_animals|
      delete_target_animals.map(&:destroy!)
      sleep(0.1)
    end
  end

in conclusion

I think there are various ways to delete large amounts of data, depending on your requirements. Therefore, if you have any best practices, I would love to hear from you (o._.) O Peco

Recommended Posts

How to delete large amounts of data in Rails and concerns
[Webpacker] Summary of how to install Bootstrap and jQuery in Rails 6.0
How to use JQuery in js.erb of Rails6
How to change the maximum and maximum number of POST data in Spark
How to make a unique combination of data in the rails intermediate table
[Rails] How to delete production environment MySQL data after putting it in the development environment
[Rails] How to define macros in Rspec and standardize processing
[Rails] How to write in Japanese
[Rails] Various ways to delete data
How to introduce jQuery in Rails 6
How to install Swiper in Rails
[Ruby On Rails] How to search and save the data of the parent table from the child table
[Rails] How to get the URL of the transition source and redirect
How to get and add data from Firebase Firestore in Ruby
How to implement search functionality in Rails
How to change app name in rails
How to get date data in Ruby
[Docker] How to back up and restore the DB data of Rails application on docker-compose [MySQL]
How to insert a video in Rails
How to use MySQL in Rails tutorial
[rails] How to configure routing in resources
How to implement ranking functionality in Rails
How to delete data with foreign key
Method definition location Summary of how to check When defined in the project and Rails / Gem
How to overwrite Firebase data in Swift
How to use credentials.yml.enc introduced in Rails 5.2
[Rails] How to handle data using enum
[Rails] I want to send data of different models in a form
JDBC promises and examples of how to write
[Order method] Set the order of data in Rails
[Rails] Introduction of pry-rails ~ How to debug binding.pry
[Rails] How to use select boxes in Ransack
How to translate Rails into Japanese in general
How to prevent direct URL typing in Rails
How to conditionally add html.erb class in Rails
How to implement a like feature in Rails
How to easily create a pull-down in Rails
How to build API with GraphQL and Rails
[Ruby on Rails] How to install Bootstrap in Rails
[Rails] How to get success and error messages
How to make a follow function in Rails
Rails "How to delete NO FILE migration files"
[Rails] How to use PostgreSQL in Vagrant environment
Rails scope anti-patterns and how to eliminate them
How to check Rails commands in the terminal
[Rails] Ranking and pagination in order of likes
[Rails] How to get rid of flash messages in a certain amount of time
How to store data simultaneously in a model associated with a nested form (Rails 6.0.0)
Summary of frequently used commands in Rails and Docker
How to clear all data in a particular table
How to delete / update the list field of OneToMany
How to set the display time to Japan time in Rails
How to implement guest login in 5 minutes in rails portfolio
How to install Docker in the local environment of an existing Rails application [Rails 6 / MySQL 8]
How to write Rails
[Java] Types of comments and how to write them
Use Extend (Concerns) in Rails to standardize Controller processing.
[Rails, JS] How to implement asynchronous display of comments
Change date and time to Japanese notation in Rails
[Rails] How to search across columns of related models (parent or child) in ransack
How to create a data URI (base64) in Java