[PYTHON] Insert from pd.DataFrame with psycopg2

I tried to insert by referring to the article here, but I stumbled, so I will write it down. It's almost the same, but there were two problems with inserting a character string. Encoding error when inserting a Japanese character string and encoding error of a character string including a backslash.

Implementation

from io import StringIO
from typing import List

import pandas as pd
import psycopg2

class Client:
    def __init__(self, dsn: str) -> None:
        """
        Arguments:
            dsn(str): 'postgresql://{username}:{password}@{hostname}:{port}/{dbname}'
        """
        self._cur = None
        self._conn = psycopg2.connect(dsn)
        self._conn.set_client_encoding('UTF8')
        self._cur = self._conn.cursor()

    def __del__(self) -> None:
        if self._cur is not None:
            self._cur.close()

        self._conn.close()

    def insert(self, table: str, values: pd.DataFrame) -> None:
        buf = StringIO()
        df.to_csv(buf, sep='\t', na_rep='\\N', index=False, header=False)
        buf.seek(0)
        columns = df.columns.values.tolist()
        self._cur.copy_from(buf, table, columns=columns)
        self._conn.commit()

It didn't happen this time, but it seems that pd.DataFrame can also cause an error when dealing with NULLABLE integer types. See also: https://qiita.com/hoto17296/items/b6c90db4b9bcdb7b6d78

Japanese encoding problem

Resolved with set_client_encoding ('UTF8') Reference: https://www.psycopg.org/docs/connection.html#connection.set_client_encoding

Backslash problem

I implemented this from the outside, but I avoided it by doing the following. Let's assume that the column a is a string

import os


df = get_dataframe()  #Get the data frame in any way
df.a = df.a.str.replace('\\', '\\\\')

client = Client(os.environ.get('POSTGRES_DSN')
client.insert(table, df)

Other implementations

In the above, we used copy_from, but when using copy_expert, we can write as follows.

cur.copy_expert(
    f"""
        COPY {table}(
            {','.join(columns)}
        )
        FROM STDIN
        WITH
            DELIMITER AS ' '
            NULL AS '\\N'
        ENCODING 'utf8'
    """,
    buf,
)

We don't need set_client_encoding ('UTF8') because we specify the encoding in the query.

Reference: https://www.psycopg.org/docs/cursor.html#cursor.copy_expert

Recommended Posts

Insert from pd.DataFrame with psycopg2
Bulk Insert Pandas DataFrame with psycopg2
Generate an insert statement from CSV with Python.
With skype, notify with skype from python!
Call C from Python with DragonFFI
Install Python from source with Ansible
Get one column from DataFrame with DataFrame
Transactions when operating PostgreSQL with Psycopg2
Free from hard-coding functions with SymPy
Run Aprili from Python with Orange
Operate Maya with OSC from vvvv
Call python from nim with Nimpy
Use PostgreSQL with Lambda (Python + psycopg2)
Read fbx from python with cinema4d