Reading pyc file (Python 3.5.2)

Out of respect to the following posts, I'll post this article in English too.

I tried reading pyc content in __pycache__ using the code mentioned above, to understand what the pyc structure looks like in recent days.

However, it was totally unsuccessful due to some mysterious error I don't understand.

$ python --version
Python 3.5.2
$ python show_pyc.py __pycache__/hello.cpython-35.pyc
magic b'160d0d0a'
moddate b'6a393e58' (Wed Nov 30 11:28:58 2016)
source_size: 227
Traceback (most recent call last):
  File "show_pyc.py", line 74, in <module>
    show_file(sys.argv[1])
  File "show_pyc.py", line 70, in show_file
    show_code(marshal.load(f))
ValueError: bad marshal data (unknown type code)

Actually, Ian (in the second article) appropriately mentioned in the comment.

The file format has changed slightly as of Python 3.3+, so the recipe above no longer works. In addition to the two original four-byte fields there is a new four-byte field that encodes the size of the source file as a long.

OK, pyc header after "3.3+" now contains another 4 bytes!

Because of this slight modification, all the documents before Python 3.3 may contain misleading descriptions. For another example, take PEP 3147.

Byte code files contain two 32-bit big-endian numbers followed by the marshaled [2] code object. The 32-bit numbers represent a magic number and a timestamp.

https://www.python.org/dev/peps/pep-3147/

This is not the case anymore. Anyway, the PEP was originally released for Python 3.2, and there was no guarantee pyc format would not change over time.

Here's my modified version of pyc reader.

import binascii
import dis
import marshal
import sys
import time
import types


def get_long(s):
    return s[0] + (s[1] << 8) + (s[2] << 16) + (s[3] << 24)


def show_hex(label, h, indent):
    h = binascii.hexlify(h).decode('ascii')
    if len(h) < 60:
        print('%s%s %s' % (indent, label, h))
    else:
        print('%s%s' % (indent, label))
        for i in range(0, len(h), 60):
            print('%s   %s' % (indent, h[i:i+60]))


def show_code(code, indent=''):
    print('%scode' % indent)
    indent += '   '
    print('%sargcount %d' % (indent, code.co_argcount))
    print('%snlocals %d' % (indent, code.co_nlocals))
    print('%sstacksize %d' % (indent, code.co_stacksize))
    print('%sflags %04x' % (indent, code.co_flags))
    show_hex('code', code.co_code, indent=indent)
    dis.disassemble(code)
    print('%sconsts' % indent)
    for const in code.co_consts:
        if isinstance(const, types.CodeType):
            show_code(const, indent+'   ')
        else:
            print('   %s%r' % (indent, const))
    print('%snames %r' % (indent, code.co_names))
    print('%svarnames %r' % (indent, code.co_varnames))
    print('%sfreevars %r' % (indent, code.co_freevars))
    print('%scellvars %r' % (indent, code.co_cellvars))
    print('%sfilename %r' % (indent, code.co_filename))
    print('%sname %r' % (indent, code.co_name))
    print('%sfirstlineno %d' % (indent, code.co_firstlineno))
    show_hex('lnotab', code.co_lnotab, indent=indent)


def show_file(fname: str) -> None:
    with open(fname, 'rb') as f:
        magic_str = f.read(4)
        mtime_str = f.read(4)
        mtime = get_long(mtime_str)
        modtime = time.asctime(time.localtime(mtime))
        print('magic %s' % binascii.hexlify(magic_str))
        print('moddate %s (%s)' % (binascii.hexlify(mtime_str), modtime))
        if sys.version_info < (3, 3):
            print('source_size: (unknown)')
        else:
            source_size = get_long(f.read(4))
            print('source_size: %s' % source_size)
        show_code(marshal.loads(f.read()))


if __name__ == '__main__':
    show_file(sys.argv[1])

Let the new reader work on the following code.

hello.py


a, b = 1, 0
if a or b:
    print("Hello World")
$ python --version
Python 3.5.2
$ ls -l hello.py
-rwxr-xr-x 1 dmiyakawa 48 Nov 30 12:41 hello.py
$ python -m py_compile hello.py
$ python show_pyc.py __pycache__/hello.cpython-35.pyc
magic b'160d0d0a'
moddate b'574a3e58' (Wed Nov 30 12:41:11 2016)
source_size: 48
code
   argcount 0
   nlocals 0
   stacksize 2
   flags 0040
   code
      6404005c02005a00005a0100650000731800650100722200650200640200
      8301000164030053
  1           0 LOAD_CONST               4 ((1, 0))
              3 UNPACK_SEQUENCE          2
              6 STORE_NAME               0 (a)
              9 STORE_NAME               1 (b)

  2          12 LOAD_NAME                0 (a)
             15 POP_JUMP_IF_TRUE        24
             18 LOAD_NAME                1 (b)
             21 POP_JUMP_IF_FALSE       34

  3     >>   24 LOAD_NAME                2 (print)
             27 LOAD_CONST               2 ('Hello World')
             30 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             33 POP_TOP
        >>   34 LOAD_CONST               3 (None)
             37 RETURN_VALUE
   consts
      1
      0
      'Hello World'
      None
      (1, 0)
   names ('a', 'b', 'print')
   varnames ()
   freevars ()
   cellvars ()
   filename 'hello.py'
   name '<module>'
   firstlineno 1
   lnotab 0c010c01

Note the size of the source file (48) is appropriately embedded in pyc too. That is the new part which is introduced in Python 3.3+ (Sorry I don't know what "+" means here).

This seemed working fine with Python 3.5.2, 3.4.3, 3.3.6, 3.2.6, and 3.6.0b3 on my environment with MacOS Sierra + pyenv. In 3.2.6, obviously, it does not answer source size because it is not embedded in pyc either.

For readers from future: do not rely on the assumption "pyc format won't change", as I did.

Note (2018-02-01)

Python 3.7 or later may have different pyc format, which will be more "deterministic". See the following PEP

https://www.python.org/dev/peps/pep-0552/

Recommended Posts

Reading pyc file (Python 3.5.2)
[Note] File reading ~ Python ~
python external file reading
Python CSV file reading and writing
Script python file
Python file processing
[Python] File / directory operations
File processing in Python
CRLF becomes LF when reading a Python file
Automatically execute python file
Reading pandas format file
File operations in Python
I tried reading a CSV file using Python
Read Python csv file
[Python] Reading CSV files
Summary of python file operations
Draw netCDF file with python
Reading .txt files with Python
AHC task (1) Reading CSV file
Download csv file with python
Execute Python script from batch file
File / folder path manipulation in Python
Easy encryption of file contents (Python)
Tips on Python file input / output
Save the binary file in Python
Implemented file download with Python + Bottle
Output to csv file with Python
Create an Excel file with Python3
Create a binary file in Python
Reading and writing NetCDF with Python
Multi-line size specification reading with python
Extract the targz file using python
Notes for Python file input / output
Python memo ① Folder and file operations
Python executable file conversion module comparison 2
python documentation reading socket programming HOWTO
Reading and writing CSV with Python
Python CGI file created on Windows
ORC, Parquet file operations in Python
[Automation with python! ] Part 2: File operation
Reading and writing text in Python
[Python] File operation using if statement
File upload to Azure Storage (Python)
[Python] Reading and writing photo location information tags (JPG file GPS Exif)
Python CSV file Character code conversion, file name extraction, reading, output, merging operation
Read the file with python and delete the line breaks [Notes on reading the file]
Error due to UnicodeDecodeError when reading CSV file with Python [For beginners]
[Python] How to convert db file to csv
Creating a simple PowerPoint file with Python
Exclusive control with lock file in Python
Read CSV file with python (Download & parse CSV file)
Python: Reading JSON data from web API
How to convert Python to an exe file
Two rules when reading Python (slice notation)
[Python] Convert csv file delimiters to tab delimiters
Check the existence of the file with python
"System trade starting with Python3" reading memo
Convert psd file to png in Python
Quickly create an excel file with Python #python
Python / numpy> list (numpy array) file save / load
Copy file and rewrite cell value @python