Avoid UnicodeEncodeError when throwing queries with Japanese in Python's MySQLdb

Overview

There are times when you want to throw an insert statement from Python to MySQL. When I use the MySQLdb package to submit a query, I get a UnicodeEncodeError if the query contains Japanese. It seems that this can be avoided by adding the options "use_unicode = True" and "charset =" utf8 "" to the argument of MySQLdb.connect ().

Status

Suppose you have a MySQL database like this.

mysql> desc test;
+-----------+-------------+------+-----+---------+-------+
| Field     | Type        | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+-------+
| test_id   | int(11)     | NO   | PRI | NULL    |       |
| test_text | varchar(64) | YES  |     | NULL    |       |
+-----------+-------------+------+-----+---------+-------+

From Python, I wanted to add a string to the column called test_text. Try throwing an insert statement using the MySQLdb package normally.

mysqltest.py


import MySQLdb
 
try: 
    conn = MySQLdb.connect(
    host=host,
    db=dbname,
    port=port,
    user=user,
    passwd=password
    )

    cur = conn.cursor()

    query = "insert into test values('1','aaa')"

    cur.execute(query)
    
except:
    cur.close()
    conn.close()

cur.close()
conn.commit()
conn.close()

Alphabet strings can be done normally.

mysql> select * from test;
+---------+-----------+
| test_id | test_text |
+---------+-----------+
|       1 | aaa       |
+---------+-----------+

However, if it is a Japanese character string ...

mysqltest.py


(abridgement)
    query = "insert into test values('2','Ah ah')"

    cur.execute(query)
(abridgement)
Traceback (most recent call last):
  File "mysqltest.py", line 30, in <module>
    conn.commit()
_mysql_exceptions.OperationalError: (2006, '')

I get an error such as. This seems to be an error when trying to conn.commit () after catching an exception in the try statement. I'm not sure about this stack trace, so let's issue a stack trace when the except statement is executed.

mysqltest.py


import MySQLdb
import traceback

try: 
    conn = MySQLdb.connect(
    host=host,
    db=dbname,
    port=port,
    user=user,
    passwd=password
    )

    cur = conn.cursor()

    query = "insert into test values('2','Ah ah')"

    cur.execute(query)

except:
    cur.close()
    conn.close()
    print(traceback.format_exc()) 

cur.close()
conn.commit()
conn.close()

Then it will come out like this.

Traceback (most recent call last):
  File "mysqltest.py", line 25, in <module>
    cur.execute(query)
  File "C:\path\to\anaconda\lib\site-packages\MySQLdb\cursors.py", line 248, in execute
    query = query.encode(db.encoding, 'surrogateescape')
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 29-31: ordinal not in range(256)

I'm getting angry that the query cannot be encoded with the character code latin-1. Apparently the MySQLdb package encodes using latin-1 by default. why?

Countermeasures

If you refer to here, it seems that you can specify the encoding method with the argument of MySQLdb.connect (). .. If you give "use_unicode = True" and "charset =" utf8 "" as arguments,

mysqltest.py


import MySQLdb
import traceback

try: 
    conn = MySQLdb.connect(
    host=host,
    db=dbname,
    port=port,
    user=user,
    passwd=password,
    use_unicode=True,
    charset="utf8"
    )

    cur = conn.cursor()

    query = "insert into test values('2','Ah ah')"

    cur.execute(query)

except:
    cur.close()
    conn.close()
    print(traceback.format_exc()) 

cur.close()
conn.commit()
conn.close()

The query was successfully encoded in utf-8 and it worked.

mysql> select * from test;
+---------+-----------+
| test_id | test_text |
+---------+-----------+
|       1 | aaa       |
|       2 |Ah ah|
+---------+-----------+

Recommended Posts

Avoid UnicodeEncodeError when throwing queries with Japanese in Python's MySQLdb
Handles UTF-8 Japanese characters in Python's MySQLdb.
Japanese output when dealing with python in visual studio
How to not escape Japanese when dealing with json in python
Be careful of LANG for UnicodeEncodeError when printing Japanese with Python 3
Behavior when returning in the with block
Put Japanese fonts in images with Colaboratory
Decorator to avoid UnicodeEncodeError in Python 3 print ()