[PYTHON] Compress variables such as DataFrame with joblib instead of pickle to read and write

It is better to save a large-scale DataFrame with pickle, When uploading to git, if it was pickle, it might not be possible to push due to capacity. Therefore, I decided to compress it with joblib and manage the data in git as well.

When saving and loading with pickle

save_pickle.py



import pandas as pd
df = pd.DataFrame([1,2,3])
df.to_pickle('df.pickle')

read_pickle.py


import pandas as pd

df = pd.read_pickle('df.pickle')

Click here to save and load with joblib. You can change the compression ratio with compress. If you compress too much, it will take time to compress and read, so In my case, 4 seemed to be good.

save_joblib.py



import pandas as pd
import joblib

df = pd.DataFrame([1,2,3])
joblib.dump(df, 'df.joblib', compress=4)

read_joblib.py


import pandas as pd
import joblib

df = joblib.load('df.joblib')

Recommended Posts

Compress variables such as DataFrame with joblib instead of pickle to read and write
Read CSV file with Python and convert it to DataFrame as it is
Compare read / write speed and capacity of csv, pickle, joblib, parquet in python environment
How to read original data or external data on the Internet with scikit-learn instead of attached data set such as iris
How to enable Read / Write of net.Conn with context with golang
Bind to class to read and write YAML
Read and write csv files with numpy
Compress python data and write to sqlite
Read JSON with Python and output as CSV
Script to tweet with multiples of 3 and numbers with 3 !!
[Python3] Read and write with datetime isoformat with json
[Python] How to write type annotations for Callable objects treated as variables and arguments
Read the csv file with jupyter notebook and write the graph on top of it
I tried to read and save automatically with VOICEROID2 2
I tried to automatically read and save with VOICEROID2
Read and write files with Slackbot ~ Bot development with Python ~
[Graph drawing] I tried to write a bar graph of multiple series with matplotlib and seaborn