| Character code | Character code of "A" | len('Ah') | 
|---|---|---|
| unicode | \u3042 | 1 | 
| euc-jp | \xa4\xa2 | 2 | 
| utf-8 | \xe3\x81\x82 | 3 | 
| conversion | code | Return value | 
|---|---|---|
| euc-jp → unicode | unicode('\xa4\xa2','euc-jp') | u'\u3042' | 
| euc-jp → unicode | '\xa4\xa2'.decode('euc-jp') | u'\u3042' | 
| utf-8 → unicode | unicode('\xe3\x81\x82','utf-8') | u'\u3042' | 
| utf-8 → unicode | '\xe3\x81\x82'.decode('utf-8') | u'\u3042' | 
| unicode → euc-jp | u'\u3042'.encode('euc-jp') | '\xa4\xa2' | 
| unicode → utf-8 | u'\u3042'.encode('utf-8') | '\xe3\x81\x82' | 
| utf-8 → unicode → euc-jp | unicode('\xe3\x81\x82','utf-8').encode('euc-jp') | '\xa4\xa2' | 
| utf-8 → unicode → euc-jp | '\xe3\x81\x82'.decode('utf-8').encode('euc-jp') | '\xa4\xa2' | 
| euc-jp → unicode → utf-8 | unicode('\xa4\xa2','euc-jp').encode('utf-8') | '\xe3\x81\x82' | 
| euc-jp → unicode → utf-8 | '\xa4\xa2'.decode('euc-jp').encode('utf-8') | '\xe3\x81\x82' | 
Unicode
Unicode time
>>> string=u'Ah'
>>> string
u'\u3042'
EUC-JP -> Unicode
EUC-At the time of JP
>>> string='Ah'
>>> string
'\xa4\xa2'
>>> len(string)
2
Wrong
>>> unicode(string)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa4 in position 0: ordinal not in range(128)
Positive
>>> unicode(string,'euc-jp')
u'\u3042'
UTF -> Unicode
UTF-At 8
>>> string='Ah'
>>> string
'\xe3\x81\x82'
>>> len(string)
3
Wrong
>>> unicode(string)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128)
Positive
>>> unicode(string,'utf-8')
u'\u3042'
| Conversion type | Function or method | 
|---|---|
| Non-unicode string → unicode string | unicode([Non-unicode string], [Character code], [errors='strict']) | 
| Non-unicode string → unicode string | Non-unicode string.decode([Character code], [errors='strict']) | 
| unicode string → non-unicode string | unicode string.encode([Character code], [errors='strict']) | 
errors
| unicode | encode | decode | errors | Contents | 
|---|---|---|---|---|
| ○ | ○ | ○ | strict | Throw UnicodeDecodeError | 
| ○ | ○ | ○ | replace | U+FFFD,Added ‘REPLACEMENT CHARACTER’ | 
| ○ | ○ | ○ | ignore | Remove characters from the resulting Unicode string | 
| × | ○ | × | xmlcharrefreplace | Use XML character references | 
Recommended Posts