[PYTHON] Get rid of garbled Japanese strings when going through FBX

It is a continuation of the last time (▼).

--I want to get a 2-digit hexadecimal number from a decimal number http://qiita.com/it_ks/items/55d43baa996860edaabb

When there is an object with a Japanese name in the scene If you pass it by FBX, it will be converted to the following object name.

FBXASC229FBXASC137FBXASC141FBXASC233FBXASC171FBXASC1702

oh! Fxxk!!

Materials, textures, blend targets, etc. will be the same, so ** Quite a lot ** This kind of thing is cultivated.

I want to do something

I tried as follows.

from binascii import a2b_hex as a2b

def fbxasc_to_chr( target_str ):
    return_list = []

    stock = ''
    for var in target_str.split( 'FBXASC' ):
        #Ignore if empty
        if not var:
            continue
        #Not applicable if the first character is the alphabet return_Add to list and go to next.
        if var[0].isalpha():
            return_list.append(var)
            continue

        #If the number of characters is more than 3, int conversion up to the third character
        ex_digit = ''
        if len(var)>3:
            ex_digit = var[3:]
            code_int = int(var[:3])
        else:
            code_int = int(var)

        stock +=hex( code_int )
        try:
            return_list.append( a2b(stock.replace('0x','')).decode('utf8') )
        except UnicodeDecodeError:
            continue

        #Reset when decode and append are done
        stock = ''

        if ex_digit:
            return_list.append(ex_digit)

    return ''.join(return_list)

Soft commentary

If you look at the FBX FBX string, you can see that ** "FBX ASC" ** is included on a regular basis.

More to say

FBXASC + 0-255

It is a combination of. This can be decoded as a character when several are lined up as one unit (1 byte). Some mean that one can be true and three can be true. It's quite complicated, so I'll leave the details to UTF-8 commentary @wikipedia. https://ja.wikipedia.org/wiki/UTF-8#.E6.96.87.E5.AD.97.E7.A8.AE (▲) According to this table, the alphanumeric symbol is 1, and the "East Asian characters / double-byte" is 3. Some of the 3rd and 4th level kanji are 4.

hex

It cannot be used as it is 0-255, so convert it to hexadecimal. This is the previous article I mentioned at the beginning. I decided to use the hex function.

After conversion, 0x is always added, but this is removed together before converting from ASCII to binary.

〜〜.replace('0x','')

ASCII-> Binary translation

So far, it's still a "string made up of 0-9a-f". Convert this to binary. Use the ** a2b_hex function ** of the ** binascii module **.

--19.8. binascii — Conversion between binary data and ASCII data https://docs.python.jp/3/library/binascii.html#binascii.a2b_hex Here, it is simply "a2b" when importing.

After ASCII-> binary conversion, ** decode method ** can be used, but if the address does not exist, a decode error will occur. The flow is to pick up this error, and if a decoding error occurs, carry it over to the next, concatenate, and convert again.

Dealing with funny friends who are not FBX ASC

If all the character strings are "FBXASC + 0-255", the story is quick ...

For example, "." Is rewritten to "FBXASC046" and appears. Ordinary alphanumeric characters are mixed in the character string as they are. So, if you simply split with FBXASC, you will get "4 digit number * (it will not be 0-255 width) *" or "3 digit + alphabetic character * (pass to hex)". (Int conversion cannot be done before) * "will be included. I decided to put the extra digits in a variable called ex_digit and concatenate them later.

Also, if the object name starts / ends with alphanumeric characters, I don't know how many elements that are not split by FBXASC will be added at the beginning / end, so sort by if to make it look like that ('A`).

As a result, I don't feel like there will be more (laughs)

I've just noticed that there is a mysterious concern that if the numbers 0-255 are included, it will be garbled-I'm tired so I'm okay: wink :: wink :: wink:

Summary

All vehicles including Japanese in the scene are sent to Siberia for 25 rubles!

reference

--Keep in mind when processing strings with Python3 http://qiita.com/FGtatsuro/items/f45c349e06d6df95839b --Hexadecimal string to string conversion http://qiita.com/atsaki/items/6120cad2e3c448d774bf -[Fix] I was addicted to alphanumeric judgment of Python strings http://qiita.com/fujiy/items/f738aa9d0bb7427e07a4

Recommended Posts

Get rid of garbled Japanese strings when going through FBX
Get rid of python's KeyError
Get rid of slow scp -pr
How to get rid of long comprehensions
Get rid of DICOM images in Python