I tried to insert by referring to the article here, but I stumbled, so I will write it down. It's almost the same, but there were two problems with inserting a character string. Encoding error when inserting a Japanese character string and encoding error of a character string including a backslash.
from io import StringIO
from typing import List
import pandas as pd
import psycopg2
class Client:
def __init__(self, dsn: str) -> None:
"""
Arguments:
dsn(str): 'postgresql://{username}:{password}@{hostname}:{port}/{dbname}'
"""
self._cur = None
self._conn = psycopg2.connect(dsn)
self._conn.set_client_encoding('UTF8')
self._cur = self._conn.cursor()
def __del__(self) -> None:
if self._cur is not None:
self._cur.close()
self._conn.close()
def insert(self, table: str, values: pd.DataFrame) -> None:
buf = StringIO()
df.to_csv(buf, sep='\t', na_rep='\\N', index=False, header=False)
buf.seek(0)
columns = df.columns.values.tolist()
self._cur.copy_from(buf, table, columns=columns)
self._conn.commit()
It didn't happen this time, but it seems that pd.DataFrame
can also cause an error when dealing with NULLABLE
integer types.
See also: https://qiita.com/hoto17296/items/b6c90db4b9bcdb7b6d78
Resolved with set_client_encoding ('UTF8')
Reference: https://www.psycopg.org/docs/connection.html#connection.set_client_encoding
I implemented this from the outside, but I avoided it by doing the following.
Let's assume that the column a
is a string
import os
df = get_dataframe() #Get the data frame in any way
df.a = df.a.str.replace('\\', '\\\\')
client = Client(os.environ.get('POSTGRES_DSN')
client.insert(table, df)
In the above, we used copy_from
, but when using copy_expert
, we can write as follows.
cur.copy_expert(
f"""
COPY {table}(
{','.join(columns)}
)
FROM STDIN
WITH
DELIMITER AS ' '
NULL AS '\\N'
ENCODING 'utf8'
""",
buf,
)
We don't need set_client_encoding ('UTF8')
because we specify the encoding in the query.
Reference: https://www.psycopg.org/docs/cursor.html#cursor.copy_expert
Recommended Posts