Try to decipher the garbled attachment file name with Python

Attachment name is garbled

In a small number of environments, when a Windows 8 store app mailer receives an attachment with a Japanese name, it looks like % 1B% 24% 42 ~ .ext or% EF% BF %% BA.ext. There was a case that it ended up. It seems that the original file name is treated as being percent-encoded. I didn't know the basic measures to receive with the correct file name, but for the time being, I wanted to find out only the file name, so I tried decoding using Python.

Decode with Python

In Python you can decode percent encoding by using ʻurllib.parse.unquote`. So, as a result of converting with the following code,


import urllib.parse
a = '%1B%24%42%4A%3F%40%2E%1B%28%42%32%37%1B%24%42%47%2F%1B%28%42%2E%70%64%66'

The result is \ x1b $ BJ? @. \ X1b (B27 \ x1b $ BG / \ x1b (B.pdf and cannot be read. Unquote specifies ʻencoding ='utf-8'` by default. It is interpreted by utf-8, but it seems that it is not decoded normally, so it seems to be a different character code.

Examine the character code

By the way, if you do not know which character code was expressed, you cannot restore to the original character string, so you need to check the character code. Looking at the character string, there are patterns such as % 1B% 24% 42 and% 1B% 28% 42, which is the code to switch the mode used in JIS code. From this, it can be expected that this character string will be JIS code.

Decode with Python again

The JIS code is also called ISO-2022-JP, and it seems that it is handled by the name iso2022-jp in Python (is this the official name?). Specify the character code with ʻencoding ='iso2022-jp'` and try decoding.


import urllib.parse
s = '%1B%24%42%4A%3F%40%2E%1B%28%42%32%37%1B%24%42%47%2F%1B%28%42%2E%70%64%66'
urllib.parse.unquote(s, encoding='iso2022-jp')

I got the result 2015.pdf, and I was able to know the file name safely.

