[PYTHON] About the behavior of yield_per of SqlAlchemy

SqlAlchemy yield_per method

Is it find_in_batches in Rails? It's a nice method that processes all the results sequentially without storing all the results in memory when selecting a large amount of data.

If you think that it is a method that handles limits and offsets nicely, it is a method that processes the selected result as a stream without such a thing.

Looking at the SQL log, I thought that there would be other people besides me who were worried that SQL would not come out as expected, so I'll make a note of it.

How to use memo just in case

yield_per.sample.py


sess = Session(engine)
for obj in sess.query(Customer).filter_by(ownd_uid = n).yield_per(10):
   hogehoge(obj)

If you write it like this, even if there are a lot of records selected under the filter condition, it will process while selecting 10 records at a time, so it will process all the select results sequentially without putting them in memory.

It's the best.

Oh? Are you using N connections?

If you try show processlist on MySQL, it seems that each connection called by yield_per has one connection. Is that so if you ask?

screen 2016-11-06 12.16.41.png

The first loop that takes 945 seconds is the top loop, but I wonder if it can be broken by connect_timeout .... (I feel like it's going to run out)

If you don't use such a large amount of data, you may not be able to see it, so if you get an error, keep a note of how to use it.

Result: No connection error.

It took 1-2 hours, but it ended without any problems.

Maybe if you keep a long connection, you won't be able to query the show process list? !!

By all means, please check the behavior of what kind of processing is done on the MySQL side!

Recommended Posts

About the behavior of yield_per of SqlAlchemy
About the behavior of enable_backprop of Chainer v2
About the behavior of copy, deepcopy and numpy.copy
About the behavior of Model.get_or_create () of peewee in Python
About the behavior of Queue during parallel processing
About the ease of Python
About the components of Luigi
About the features of Python
Tank game made with python About the behavior of tanks
About the return value of pthread_mutex_init ()
A memo about the behavior of bowtie2 during multiple hits
About the return value of the histogram.
About the upper limit of threads-max
About the size of matplotlib points
About the basics list of Python basics
Check the behavior of destructor in Python
About the virtual environment of python version 3.7
About the arguments of the setup function of PyCaret
About the test
Behavior of multiprocessing.pool.Pool.map
I wanted to be careful about the behavior of Python's default arguments
About the queue
About the accuracy of Archimedean circle calculation method
About the X-axis notation of Matplotlib bar graphs
See the behavior of drunkenness with reinforcement learning
About the processing speed of SVM (SVC) of scikit-learn
The behavior of signal () depends on the compile options
A note about the python version of python virtualenv
About the development contents of machine learning (Example)
[Note] About the role of underscore "_" in Python
Visualize the behavior of the sorting algorithm with matplotlib
About the * (asterisk) argument of python (and itertools.starmap)
The beginning of cif2cell
A memorandum about the warning of the pylint output result
About all of numpy
The meaning of self
About assignment of numpy.ndarray
[python] behavior of argmax
the zen of Python
Think about the next generation of Rack and WSGI
About testing in the implementation of machine learning models
About the inefficiency of data transfer in luigi on-memory
Basic usage of SQLAlchemy
The story of sys.path.append ()
Examine the close processing of Python dataset (SQLAlchemy wrapper)
About the Unfold function
About the service command
About the uncluttered arrangement in the import order of flake8
A story about changing the master name of BlueZ
About variable of chainer
About the confusion matrix
About the Visitor pattern
Personal notes about the integration of vscode and anaconda
A reminder about the implementation of recommendations in Python
Revenge of the Types: Revenge of types
A note on the default behavior of collate_fn in PyTorch
Reuse the behavior of the @property method by using a descriptor [16/100]
Think about the analysis environment (Part 1: Overview) * As of January 2017
About the camera change event of Google Maps Android API
About the garbled Japanese part of pandas-profiling in Jupyter notebook
I tried a little bit of the behavior of the zip function