[PYTHON] Save Pandas DataFrame as .csv.gz in Amazon S3

Note that I was a little addicted when trying to gzip the Pandas DataFrame and save it as a CSV file in an Amazon S3 bucket.

import gzip
from io import BytesIO
import pandas as pd
import boto3

def save_to_s3(df: pd.DataFrame, bucket: str, key: str):
    """Pandas DataFrame.csv.Save to Amazon S3 as gz"""
    buf = BytesIO()
    with gzip.open(buf, mode="wt") as f:
        df.to_csv(f)
    s3 = boto3.client("s3")
    s3.put_object(Bucket=bucket, Key=key, Body=buf.getvalue())

The points are as follows.

--Enter BytesIO () because the first argument of gzip.open is a file-like object that represents the gzip format. --Since the output of pandas.DataFrame.to_csv is a string, mode of gzip.open specifies" write text ( wt) ".

At first I thought that if I specified compression =" gzip " in pandas.DataFrame.to_csv, it would not be necessary to compress it explicitly, but if I entered a file-like object in to_csv, it would be . The compression option seems to be ignored and couldn't be used.

Recommended Posts

Save Pandas DataFrame as .csv.gz in Amazon S3
Save Pandas DataFrame as .csv.gz in Amazon S3
How to reassign index in pandas dataframe
Download Pandas DataFrame as a CSV file
Is there NaN in the pandas DataFrame?
Delete rows with arbitrary values in pandas DataFrame
Remove rows with duplicate indexes in pandas DataFrame
Check if the expected column exists in Pandas DataFrame
Convert comma-separated numeric strings to numbers in Pandas DataFrame
How to display DataFrame as a table in Markdown
Save Pandas DataFrame as .csv.gz in Amazon S3
Load nested json with pandas
Download Pandas DataFrame as a CSV file
3D plot Pandas DataFrame
Python application: Pandas # 3: Dataframe
Formatted display of pandas DataFrame
Export pandas dataframe to excel
[Python3] Save the mean and covariance matrix in json with pandas
[python] Use DataFrame to label arbitrary variables and arrays together and save them in csv [pandas]