I tried various methods to send Japanese mail with Python

I tried various things. To conclude first, basically, you don't have to think too much because you can prevent garbled characters. Also, I don't really understand the behavior when playing with the character set with add_charset in Python3.

This is the main subject.

Basic code

`sendmail.py`


# -*- coding: utf-8 -*-
import smtplib
from email.mime.text import MIMEText
from email.header import Header
from email import charset

con = smtplib.SMTP('localhost')
con.set_debuglevel(True)

cset = 'utf-8'  # <---------------(It's a character set setting)

message = MIMEText(u'It's a Japanese email ★', 'plain', cset)
message['Subject'] = Header(u'Email sending test', cset)
message['From'] = '[email protected]'
message['To'] = '[email protected]'

con.sendmail('[email protected]', ['[email protected]'], 
message.as_string()) 

con.close()

Let's try it.

Python2.7.2 + None It's a sudden change ball, but I'll try the case where the character set is not set. In the basic code, I tried cset = None.

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-9: ordinal not in range(128)

I'm really angry. If you do not register any character set, it will be processed as us-ascii, so it will be moss somewhere.

Python2.7.2 + utf-8 (with BASE64) Try `` `cset = utf-8``` in the basic code. I was able to receive this safely. Raw data looks like this.

Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Subject: =?utf-8?b?44Oh44O844Or6YCB5L+h44OG44K544OI?=
From: [email protected]
To: [email protected]
Reply-To: [email protected]

5pel5pys6Kqe44Gu44Oh44O844Or44Gg44KI4piF

Body encoding is Base64. This is because the Python standard is as follows.

`/email/charset.py`


CHARSETS = {
...
'utf-8':       (SHORTEST,  BASE64, 'utf-8'),
#Tuples are header encoding,It shows the body encoding and output encoding charset.It was written in py
…
}

Probably, there is almost no problem with this, but in the past, it was NG with au terminals. But I think this is all right. That's it.

Python2.7.2 + utf-8 with QP

I hate Base64! If so, overwrite CHARSET. Insert this near the beginning of the basic code.

`sendmain.Write somewhere in py.py`


charset.add_charset('utf-8', charset.SHORTEST, charset.QP, 'utf-8')
# uft-As a setting of 8, the header is SHORTEST and the body is QP (quoted).-printable)I'll use it, the output encoding is utf-8
cset = utf-8

When you do this, it looks like this:

Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Subject: =?utf-8?b?44Oh44O844Or6YCB5L+h44OG44K544OI?=
From: [email protected]
To: [email protected]
Reply-To: [email protected]

=E6=97=A5=E6=9C=AC=E8=AA=9E=E3=81=AE=E3=83=A1=E3=83=BC=E3=83=AB=E3=81=A0=E3=
=82=88=E2=98=85

It's something other than BASE64. There is no problem with reception.

Python2.7.2 + utf-8 with 8bit What if I don't specify anything for body encoding?

`sendmain.Write somewhere in py.py`


charset.add_charset('utf-8', charset.SHORTEST, None, 'utf-8')
cset = utf-8

The output is like this. It comes out as it is.

Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Subject: =?utf-8?b?44Oh44O844Or6YCB5L+h44OG44K544OI?=
From: [email protected]
To: [email protected]
Reply-To: [email protected]

It's a Japanese email ★

Content-Transfer-Encoding can be 7bit or 8bit. This is in /email/encoders.py It has been decided that the function encode_7or8bit () is good. If you want to make it 8bit, this is it. Maybe this is quite a lot these days.

Python2.7.2 + shift_jis

IME-Version: 1.0
Content-Type: text/plain; charset="iso-2022-jp"
Content-Transfer-Encoding: 7bit
Subject: =?iso-2022-jp?b?GyRCJWEhPCVrQXc/LiVGJTklSBsoQg==?=
From: [email protected]
To: [email protected]
Reply-To: [email protected]

F|K\8l$N%a!<%k$@$h!z

When the character set is'shift_jis', the output is iso-2022-jp, which everyone loves. This is the standard setting of Python

'shift_jis':   (BASE64,    None,    'iso-2022-jp'),

Body encoding is None. Content-Transfer-Encoding is 7bit without permission.

Python3.3.0 + None Next, try with Python3. First, if you don't specify a character set. The one who got UnicodeEncodeError in Python2.

Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Subject: =?utf-8?b?44Oh44O844Or6YCB5L+h44OG44K544OI?=
From: [email protected]
To: [email protected]
Reply-To: [email protected]

5pel5pys6Kqe44Gu44Oh44O844Or44Gg44KI4piF

How can I send it? There is no problem with reception. It feels like I've read the contents a little, try it with us-ascii, and if I get a UnicodeEncodeError, try it with utf-8. So, with Python3.3, you can skip emails without having to be aware of the character set at all.

Python3.3.0 + utf-8 (with BASE64)

So, even if you do cset = utf-8, it should be the same as above.

Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Subject: =?utf-8?b?44Oh44O844Or6YCB5L+h44OG44K544OI?=
From: [email protected]
To: [email protected]
Reply-To: [email protected]

5pel5pys6Kqe44Gu44Oh44O844Or44Gg44KI4piF

the same! Next!

Python3.3.0 + utf-8 with QP I want to use QP for BODY!

So, as with Python 2, write somewhere below.

`sendmain.Write somewhere in py.py`


charset.add_charset('utf-8', charset.SHORTEST, charset.QP, 'utf-8')
cset = utf-8

send e-mail!

   self.set_payload(_text, _charset)
  File "/Users/yasunori/.pythonbrew/pythons/Python-3.3.0/Frameworks/Python.framework/Versions/3.3/lib/python3.3/email/message.py", line 280, in set_payload
    self.set_charset(charset)
  File "/Users/yasunori/.pythonbrew/pythons/Python-3.3.0/Frameworks/Python.framework/Versions/3.3/lib/python3.3/email/message.py", line 317, in set_charset
    self._payload = charset.body_encode(self._payload)
  File "/Users/yasunori/.pythonbrew/pythons/Python-3.3.0/Frameworks/Python.framework/Versions/3.3/lib/python3.3/email/charset.py", line 395, in body_encode
    return email.quoprimime.body_encode(string)
  File "/Users/yasunori/.pythonbrew/pythons/Python-3.3.0/Frameworks/Python.framework/Versions/3.3/lib/python3.3/email/quoprimime.py", line 240, in body_encode
    if body_check(ord(c)):
  File "/Users/yasunori/.pythonbrew/pythons/Python-3.3.0/Frameworks/Python.framework/Versions/3.3/lib/python3.3/email/quoprimime.py", line 81, in body_check
    return chr(octet) != _QUOPRI_BODY_MAP[octet]
KeyError: 26085

I was angry! Scary scary scary scary! !! !! !! It is said that there is no such key in the array. As you can see, _QUOPRI_BODY_MAP should be about alphanumeric characters, but I'm trying to refer to the 26085th character. I wonder why ... I'm not sure at first glance, so I put it on hold.

Python3.3.0 + utf-8 with 8bit I want to send it as it is in 8bit.

`sendmain.Write somewhere in py.py`


charset.add_charset('utf-8', charset.SHORTEST, None, 'utf-8')

Add this and send.

File "/Users/yasunori/.pythonbrew/pythons/Python-3.3.0/Frameworks/Python.framework/Versions/3.3/lib/python3.3/smtplib.py", line 744, in sendmail
    msg = _fix_eols(msg).encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode characters in position 231-240: ordinal not in range(128)

I was angry! Scary scary scary scary! !! !! !! Why is this angry? Actually, unlike the QP error, `message.as_string ()` is passed, and the mail text is properly completed.

Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Subject: =?cp932?b?g4GBW4OLkZeQTYNlg1iDZw==?=
From: [email protected]
To: [email protected]
Reply-To: [email protected]

It's a Japanese email ★

It's okay. Send it! I'm sometimes angry. Looking at the error part earlier, it seems to be useless because I am trying to encode with ascii solid writing in smtplib ... What should I do with this? Please tell me ...

Python3.3.0 + shift_jis

Stable sjis.

MIME-Version: 1.0
Content-Type: text/plain; charset="iso-2022-jp"
Content-Transfer-Encoding: 7bit
Subject: =?iso-2022-jp?b?GyRCJWEhPCVrQXc/LiVGJTklSBsoQg==?=
From: [email protected]
To: [email protected]
Reply-To: [email protected]

F|K\8l$N%a!<%k$@$h!z

The end

It works almost as expected, but in Python3 series, the behavior when add_charset is quite suspicious and it is a demon gate so far. Am I doing it wrong? ??