Tips for dealing with binaries in Python

Here are some tips for working with binaries in Python.

There are two ways to work with binaries in Python, the struct module and the ctypes.Structure class. Basically, the struct module uses the ctypes.Structure class when you want to handle a few bytes of binary, or when you want to work with more bytes or C / C ++.

struct module

As an example, let's read the binary of a PNG file. In a PNG file, the first 8 bytes are fixed in the header. The 9th to 18th bytes of data are stored in the IHDR area (to be exact, part of IHDR), including the vertical and horizontal size, bit depth, and color mode of the image.

import struct

png_data = open("sample.png ", "rb").read()

struct.unpack_from(">I4sIIBB", png_data, 8)
# (13, b'IHDR', 250, 156, 8, 2)

You can read the data with struct.unpack, but if the offset and size of the buffer to be given do not match, an error will occur. If you want to read part of the data, struct.unpack_from is useful.

Putting is x

When reading the binary, putting (dust area for alignment) comes out by all means. The x format is convenient because it skips the data.

data = b'd\x00\xb0\x04'

# NG
kind, _, value = struct.unpack("BBH", data)

# Yes!
kind, value = struct.unpack("BxH", data)

struct.Struct class

The struct.Struct class is a classification of the format string of the struct module. Since the format is analyzed when the class is instantiated, it is faster to create the instance in advance when repeatedly pack / ʻunpackin the loop. It's confusing with thectypes.Structre` class.

point = struct.Struct("HH")

for x, y in zip(range(10), range(10)):
    point.pack(x, y)

Format character list

letter C language type Standard size
x Putting bite 1
c char 1
b signed char 1
B unsigned char, BYTE 1
? _Bool 1
h short 2
H unsinged short, WORD 2
i int 4
I unsigned int, DWORD 4
l long, LONG 4
L unsigned long, ULONG 4
q long long, LONGLONG 8
Q unsigned long long, ULONGLONG 8
n ssize_t(Python3.3 or later) Native only
N size_t(Python3.3 or later) Native only
f float 4
d double 8
s char[] -
p char[] -
P void * -

Format character example:

BITMAPINFOHEADER structure


typedef struct tagBITMAPINFOHEADER {
    DWORD  biSize;
    LONG   biWidth;
    LONG   biHeight;
    WORD   biPlanes;
    WORD   biBitCount;
    DWORD  biCompression;
    DWORD  biSizeImage;
    LONG   biXPelsPerMeter;
    LONG   biYPelsPerMeter;
    DWORD  biClrUsed;
    DWORD  biClrImportant;
} BITMAPINFOHEADER;

Format characters for the BITMAPINFOHEADER structure


"IllHHIIllII"

List of byte order and alignment

letter Byte order size alignment
@ Native Native Native
= Native Standard size None
< Little endian Standard size None
> Big endian Standard size None
! Big endian Standard size None

@When=The difference of(CPU=amd64,OS=Ubuntu64bit)


struct.calcsize("BI")
# 8

struct.calcsize("=BI")
# 5

Note that if you explicitly specify the endian, the alignment will be "none".

ctypes.Structure class

You can work with C / C ++ structures in the ctypes.Structure class. If you try to read a lot of data with the'struct'module, the format will look like a spell, so if you want to write a solid read of a large amount of binary data, you should use the ctypes.Structure class. Let's do it.

Structure basics

Inherit ctypes.Structure and define the types in _field_.

from ctypes import *

"""
typedef struct {
    char identity[4];
    uint16_t x;
    uint16_t y;
} TestStructure;
"""
class TestStructure(Structure):
    _fields_ = (
        ('identity', c_char * 4),
        ('x', c_uint16),
        ('y', c_uint16),
    )

The instance is defined as follows.

t = TestStructure(b"TEST", 100, 100)