[PYTHON] UnicodeDecodeError:'ascii' codec can't decode byte 0xa4 in position 0: ordinal not in range (128)

Conclusion

Character code Character code of "A" len('Ah')
unicode \u3042 1
euc-jp \xa4\xa2 2
utf-8 \xe3\x81\x82 3
conversion code Return value
euc-jp → unicode unicode('\xa4\xa2','euc-jp') u'\u3042'
euc-jp → unicode '\xa4\xa2'.decode('euc-jp') u'\u3042'
utf-8 → unicode unicode('\xe3\x81\x82','utf-8') u'\u3042'
utf-8 → unicode '\xe3\x81\x82'.decode('utf-8') u'\u3042'
unicode → euc-jp u'\u3042'.encode('euc-jp') '\xa4\xa2'
unicode → utf-8 u'\u3042'.encode('utf-8') '\xe3\x81\x82'
utf-8 → unicode → euc-jp unicode('\xe3\x81\x82','utf-8').encode('euc-jp') '\xa4\xa2'
utf-8 → unicode → euc-jp '\xe3\x81\x82'.decode('utf-8').encode('euc-jp') '\xa4\xa2'
euc-jp → unicode → utf-8 unicode('\xa4\xa2','euc-jp').encode('utf-8') '\xe3\x81\x82'
euc-jp → unicode → utf-8 '\xa4\xa2'.decode('euc-jp').encode('utf-8') '\xe3\x81\x82'

Unicode

Unicode time


>>> string=u'Ah'
>>> string
u'\u3042'

EUC-JP -> Unicode

EUC-At the time of JP


>>> string='Ah'
>>> string
'\xa4\xa2'
>>> len(string)
2

Wrong


>>> unicode(string)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa4 in position 0: ordinal not in range(128)

Positive


>>> unicode(string,'euc-jp')
u'\u3042'

UTF -> Unicode

UTF-At 8


>>> string='Ah'
>>> string
'\xe3\x81\x82'
>>> len(string)
3

Wrong


>>> unicode(string)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128)

Positive


>>> unicode(string,'utf-8')
u'\u3042'

Character code conversion function & method

Conversion type Function or method
Non-unicode string → unicode string unicode([Non-unicode string], [Character code], [errors='strict'])
Non-unicode string → unicode string Non-unicode string.decode([Character code], [errors='strict'])
unicode string → non-unicode string unicode string.encode([Character code], [errors='strict'])

errors

unicode encode decode errors Contents
strict Throw UnicodeDecodeError
replace U+FFFD,Added ‘REPLACEMENT CHARACTER’
ignore Remove characters from the resulting Unicode string
× × xmlcharrefreplace Use XML character references

Recommended Posts

UnicodeDecodeError:'ascii' codec can't decode byte 0xa4 in position 0: ordinal not in range (128)
yum throws "UnicodeDecodeError:'ascii' codec can't decode byte 0xc3 in position 15: ordinal not in range (128)" error
When codec can't decode byte appears in python
Mecab --UnicodeDecodeError:'utf8' codec can't decode byte
UnicodeDecodeError:'utf-8' codec ca n’t decode byte 0x91 in position 1: invalid start byte
SyntaxError: (unicode error)'unicodeescape' codec can't decode bytes in position 2-3: truncated \ UXXXXXXXX escape
UnicodeDecodeError:'ascii' codec can't decode byte 0xa4 in position 0: ordinal not in range (128)
yum throws "UnicodeDecodeError:'ascii' codec can't decode byte 0xc3 in position 15: ordinal not in range (128)" error
When codec can't decode byte appears in python
Today's python error: UnicodeEncodeError:'ascii' codec can't encode characters
Today's python error: UnicodeEncodeError:'utf-8' codec can't encode characters
I was addicted to deploying GoogleCloudFunctions ('ascii' codec can't encode character u'\ u281b' in position 58 appears)
'shift_jis' codec can't encode character'\ uff0d' in position 0: resolution of illegal multibyte sequence