[PYTHON] Pitfalls and workarounds for pandas.DataFrame.to_sql

Although a temporary table was created on the DB based on the data frame type data in batch The 64th and subsequent characters of the object type column data disappeared without permission. I didn't get any errors or warnings ... so I'll summarize the remedies.


Suppose you want to create a table in DB based on the following data (name is 70 characters)

sample.csv


name,age
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDDEEEEEEEEEEFFFFFFFFFFGGGGGGGGGG,20
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDDEEEEEEEEEEFFFFFFFFFFGGGGGGGGGG,18
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDDEEEEEEEEEEFFFFFFFFFFGGGGGGGGGG,23
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDDEEEEEEEEEEFFFFFFFFFFGGGGGGGGGG,21

When using pandas to_sql function and mysql.connector

Sample source

sample_to_sql.py


import pandas as pd
import mysql.connector
df = pd.read_csv('sample.csv')
table_name = "sample_to_sql"
db_settings = {
    "host": <hostname>,
    "database": <Database name>,
    "user": <User name>,
    "password": <password>,
    "port":3306
}
con = mysql.connector.connect(**db_settings)
df.to_sql(table_name, con, flavor='mysql', index=False)

result

スクリーンショット 2016-04-25 17.32.13.png

It's a little confusing, but the 64th and subsequent characters of name have disappeared.

Looking at the table information スクリーンショット 2016-04-25 17.33.39.png

The limit is 63 characters ... Columns of type object were hard-coded to create a table with varchar (63). (Around line 1310 of ~~~ / site-packages / pandas / io / sql.py)

solution

When I run the above script, I get this warning in the df.to_sql part.

FutureWarning: The 'mysql' flavor with DBAPI connection is deprecated and will be removed in future versions. MySQL will be further supported with SQLAlchemy connectables.

So, install sqlalchemy

sudo pip install sqlalchemy

Sample code

sample_sqlalchemy.py


import pandas as pd
from sqlalchemy import create_engine
df = pd.read_csv('sample.csv')
table_name = "sample_sqlalchemy"
db_settings = {
    "host": <hostname>,
    "database": <Database name>,
    "user": <User name>,
    "password": <password>,
    "port":3306
}
engine = create_engine('mysql://{user}:{password}@{host}:{port}/{database}'.format(**db_settings))
df.to_sql(table_name, engine, flavor='mysql', index=False)

At this point, a table that fits the contents of the data frame is created. Looking at the contents, it looks like this.

スクリーンショット 2016-04-25 17.07.00.png

What was varchar was text. The data is in place. スクリーンショット 2016-04-25 17.42.52.png

Recommended Posts

Pitfalls and workarounds for pandas.DataFrame.to_sql
(Windows) Causes and workarounds for UnicodeEncodeError on Python 3
For me: Infrastructure and network notes
(For myself) Flask_ex (templates and static)
An app for smart people and smart people
Don't print and import logging for logging
Tips for replacing and debugging functions
Vectorize sentences and search for similar sentences
3 types of workarounds for activate collision problem when pyenv and anaconda coexist