I tried various things. To conclude first, basically, you don't have to think too much because you can prevent garbled characters. Also, I don't really understand the behavior when playing with the character set with add_charset in Python3.
This is the main subject.
sendmail.py
# -*- coding: utf-8 -*-
import smtplib
from email.mime.text import MIMEText
from email.header import Header
from email import charset
con = smtplib.SMTP('localhost')
con.set_debuglevel(True)
cset = 'utf-8' # <---------------(It's a character set setting)
message = MIMEText(u'It's a Japanese email ★', 'plain', cset)
message['Subject'] = Header(u'Email sending test', cset)
message['From'] = '[email protected]'
message['To'] = '[email protected]'
con.sendmail('[email protected]', ['[email protected]'],
message.as_string())
con.close()
Let's try it.
Python2.7.2 + None
It's a sudden change ball, but I'll try the case where the character set is not set.
In the basic code, I tried cset = None
.
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-9: ordinal not in range(128)
I'm really angry. If you do not register any character set, it will be processed as us-ascii, so it will be moss somewhere.
Python2.7.2 + utf-8 (with BASE64) Try `` `cset = utf-8``` in the basic code. I was able to receive this safely. Raw data looks like this.
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Subject: =?utf-8?b?44Oh44O844Or6YCB5L+h44OG44K544OI?=
From: [email protected]
To: [email protected]
Reply-To: [email protected]
5pel5pys6Kqe44Gu44Oh44O844Or44Gg44KI4piF
Body encoding is Base64. This is because the Python standard is as follows.
/email/charset.py
CHARSETS = {
...
'utf-8': (SHORTEST, BASE64, 'utf-8'),
#Tuples are header encoding,It shows the body encoding and output encoding charset.It was written in py
…
}
Probably, there is almost no problem with this, but in the past, it was NG with au terminals. But I think this is all right. That's it.
Python2.7.2 + utf-8 with QP
I hate Base64! If so, overwrite CHARSET. Insert this near the beginning of the basic code.
sendmain.Write somewhere in py.py
charset.add_charset('utf-8', charset.SHORTEST, charset.QP, 'utf-8')
# uft-As a setting of 8, the header is SHORTEST and the body is QP (quoted).-printable)I'll use it, the output encoding is utf-8
cset = utf-8
When you do this, it looks like this:
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Subject: =?utf-8?b?44Oh44O844Or6YCB5L+h44OG44K544OI?=
From: [email protected]
To: [email protected]
Reply-To: [email protected]
=E6=97=A5=E6=9C=AC=E8=AA=9E=E3=81=AE=E3=83=A1=E3=83=BC=E3=83=AB=E3=81=A0=E3=
=82=88=E2=98=85
It's something other than BASE64. There is no problem with reception.
Python2.7.2 + utf-8 with 8bit What if I don't specify anything for body encoding?
sendmain.Write somewhere in py.py
charset.add_charset('utf-8', charset.SHORTEST, None, 'utf-8')
cset = utf-8
The output is like this. It comes out as it is.
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Subject: =?utf-8?b?44Oh44O844Or6YCB5L+h44OG44K544OI?=
From: [email protected]
To: [email protected]
Reply-To: [email protected]
It's a Japanese email ★
Content-Transfer-Encoding can be 7bit or 8bit. This is in /email/encoders.py It has been decided that the function encode_7or8bit () is good. If you want to make it 8bit, this is it. Maybe this is quite a lot these days.
Python2.7.2 + shift_jis
IME-Version: 1.0
Content-Type: text/plain; charset="iso-2022-jp"
Content-Transfer-Encoding: 7bit
Subject: =?iso-2022-jp?b?GyRCJWEhPCVrQXc/LiVGJTklSBsoQg==?=
From: [email protected]
To: [email protected]
Reply-To: [email protected]
F|K\8l$N%a!<%k$@$h!z
When the character set is'shift_jis', the output is iso-2022-jp, which everyone loves. This is the standard setting of Python
'shift_jis': (BASE64, None, 'iso-2022-jp'),
Body encoding is None. Content-Transfer-Encoding is 7bit without permission.
Python3.3.0 + None Next, try with Python3. First, if you don't specify a character set. The one who got UnicodeEncodeError in Python2.
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Subject: =?utf-8?b?44Oh44O844Or6YCB5L+h44OG44K544OI?=
From: [email protected]
To: [email protected]
Reply-To: [email protected]
5pel5pys6Kqe44Gu44Oh44O844Or44Gg44KI4piF
How can I send it? There is no problem with reception. It feels like I've read the contents a little, try it with us-ascii, and if I get a UnicodeEncodeError, try it with utf-8. So, with Python3.3, you can skip emails without having to be aware of the character set at all.
Python3.3.0 + utf-8 (with BASE64)
So, even if you do cset = utf-8
, it should be the same as above.
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Subject: =?utf-8?b?44Oh44O844Or6YCB5L+h44OG44K544OI?=
From: [email protected]
To: [email protected]
Reply-To: [email protected]
5pel5pys6Kqe44Gu44Oh44O844Or44Gg44KI4piF
the same! Next!
Python3.3.0 + utf-8 with QP I want to use QP for BODY!
So, as with Python 2, write somewhere below.
sendmain.Write somewhere in py.py
charset.add_charset('utf-8', charset.SHORTEST, charset.QP, 'utf-8')
cset = utf-8
send e-mail!
self.set_payload(_text, _charset)
File "/Users/yasunori/.pythonbrew/pythons/Python-3.3.0/Frameworks/Python.framework/Versions/3.3/lib/python3.3/email/message.py", line 280, in set_payload
self.set_charset(charset)
File "/Users/yasunori/.pythonbrew/pythons/Python-3.3.0/Frameworks/Python.framework/Versions/3.3/lib/python3.3/email/message.py", line 317, in set_charset
self._payload = charset.body_encode(self._payload)
File "/Users/yasunori/.pythonbrew/pythons/Python-3.3.0/Frameworks/Python.framework/Versions/3.3/lib/python3.3/email/charset.py", line 395, in body_encode
return email.quoprimime.body_encode(string)
File "/Users/yasunori/.pythonbrew/pythons/Python-3.3.0/Frameworks/Python.framework/Versions/3.3/lib/python3.3/email/quoprimime.py", line 240, in body_encode
if body_check(ord(c)):
File "/Users/yasunori/.pythonbrew/pythons/Python-3.3.0/Frameworks/Python.framework/Versions/3.3/lib/python3.3/email/quoprimime.py", line 81, in body_check
return chr(octet) != _QUOPRI_BODY_MAP[octet]
KeyError: 26085
I was angry! Scary scary scary scary! !! !! !! It is said that there is no such key in the array. As you can see, _QUOPRI_BODY_MAP should be about alphanumeric characters, but I'm trying to refer to the 26085th character. I wonder why ... I'm not sure at first glance, so I put it on hold.
Python3.3.0 + utf-8 with 8bit I want to send it as it is in 8bit.
sendmain.Write somewhere in py.py
charset.add_charset('utf-8', charset.SHORTEST, None, 'utf-8')
Add this and send.
File "/Users/yasunori/.pythonbrew/pythons/Python-3.3.0/Frameworks/Python.framework/Versions/3.3/lib/python3.3/smtplib.py", line 744, in sendmail
msg = _fix_eols(msg).encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode characters in position 231-240: ordinal not in range(128)
I was angry! Scary scary scary scary! !! !! !!
Why is this angry?
Actually, unlike the QP error, `message.as_string ()`
is passed, and the mail text is properly completed.
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Subject: =?cp932?b?g4GBW4OLkZeQTYNlg1iDZw==?=
From: [email protected]
To: [email protected]
Reply-To: [email protected]
It's a Japanese email ★
It's okay. Send it! I'm sometimes angry. Looking at the error part earlier, it seems to be useless because I am trying to encode with ascii solid writing in smtplib ... What should I do with this? Please tell me ...
Python3.3.0 + shift_jis
Stable sjis.
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-2022-jp"
Content-Transfer-Encoding: 7bit
Subject: =?iso-2022-jp?b?GyRCJWEhPCVrQXc/LiVGJTklSBsoQg==?=
From: [email protected]
To: [email protected]
Reply-To: [email protected]
F|K\8l$N%a!<%k$@$h!z
It works almost as expected, but in Python3 series, the behavior when add_charset is quite suspicious and it is a demon gate so far. Am I doing it wrong? ??
Recommended Posts