[PYTHON] How to disguise a ZIP file as a PNG file

Frequently Asked Questions

What does "impersonate a ZIP file as a PNG file" mean?

At first glance, it looks like a PNG file, and you can actually display images with a PNG decoder without any problems, but if you rewrite the extension to ZIP (in the case of an OS that determines the file type by extension), it can be treated as a ZIP file. Let's call a ZIP file disguised as a PNG file. Also, creating such a file is called "impersonating a ZIP file as a PNG file".

In other words, create a file that can be used as both a PNG file and a ZIP file.

Basically, it is supposed to spoof a combination of an existing small PNG file and an existing small ZIP file.

From a different point of view, it is possible to disguise a PNG file as a ZIP file.

Can any ZIP be disguised?

There are some restrictions.

Also, if you change the file as PNG or ZIP (filtering images with PNG, adding / deleting files with ZIP, etc.), there is a high possibility that you cannot use it as a camouflage file after that.

There may be various other implementation restrictions.

What are the benefits of disguising?

I do not know.

Basic structure of ZIP file and impersonation method 1

I won't go into the detailed specifications of the ZIP file format. Also, ignore ZIP64 and split ZIP.

The basic structure of a ZIP file can be easily understood by following it from the ʻEnd of Central Directory (hereafter ʻEOCD) near the end of the file.

ʻEOCDhas the position ofCentral Directory (hereafter CEN), that is, the offset from the beginning of the file. In addition, CEN exists for the number of files in the archive, and each CENhas the position (offset) ofLocal file header (hereinafter LOC`).

The LOC is followed by the body of the file (compressed, uncompressed, encrypted, plaintext, etc.). I have omitted various explanations.

zip_format.png

Because of this structure, even if there is some data at the beginning or end of the ZIP file, or between each data, it can be handled normally as a ZIP file.

As a specific example, a self-extracting ZIP file (EXE file) can be treated as a normal ZIP file by most ZIP archivers. From a different point of view, the self-extracting ZIP file can be treated as both an EXE file and a ZIP file.

Due to these characteristics, even if you store the entire ZIP file in another file, it can be treated as a normal ZIP file.

However, when embedding an existing ZIP file in another file, it is necessary to properly correct the part that has the offset from the beginning of the file. The specific correction points are the offset of CEN in ʻEOCD and the offset of LOCin eachCEN`. (If you have extended data for ZIP64, there may be an offset there too, but this time I will ignore it)

By the way, the byte order of ZIP files is little endian.

Basic structure of PNG file and impersonation method 2

I won't go into the detailed specifications of the PNG file format.

The basic structure of PNG is that it has an 8-byte PNG file signature at the beginning, followed by multiple chunks. Some chunks are mandatory chunks, while others are optional chunks (auxiliary chunks). Some chunks have a fixed order (eg, ʻIHDR chunks are at the beginning, ʻIEND chunks are at the end, etc.), but others are in any order.

png_format.png

Each chunk consists of "length (4 bytes)", "chunk type (4 bytes)", "chunk data (arbitrary number of bytes)", and "CRC (4 bytes)". Byte order is big endian.

"Length" indicates the length of "chunk data" ("length" itself does not include "chunk type" and "CRC"). The maximum value is $ 2 ^ {31} -1 $ bytes. "CRC" is the value obtained by calculating CRC32 for the range of "chunk type" and "chunk data".

png_chunk.png

The chunk type is 4 alphabetic characters in ASCII code. It is case sensitive. (In the specifications, it is said that it should be treated as binary data instead of being treated as characters, but it will be described on a character basis for the sake of simplicity of explanation.)

In addition, the meaning is different depending on the case of each digit (in the specifications, judge by on / off of the 5th bit).

digit 5th bit name uppercase letter(off) Lowercase(on)
1st digit Auxiliary bit Mandatory chunk Auxiliary chunk
2nd digit Private bit Public chunk Private chunk
3rd digit Reserved (for future expansion) bits (Fixed capital letters) (Do not use lowercase letters)
4th digit Copyable bits Cannot be copied when the image is changed Can be copied when the image is changed

Mandatory chunks are required to display the image, so if the PNG decoder encounters an unknown required chunk, it will be an error. Conversely, if you encounter an unknown auxiliary chunk, you can ignore it.

Public chunks are chunks registered in specifications and public chunk lists, and private chunks are used as chunks unique to the app. In general, I think you need a guard that can handle name conflicts between private chunks.

Unlike other bits, the copyable bit indicates whether an unknown chunk can be copied as it is when the image is processed with a PNG editor (a program that filters PNG files) instead of a PNG decoder. ..

This time, I would like to create my own "chunk that stores ZIP files" as an auxiliary chunk. Any chunk type is fine, but since it is a "ZIP container chunk", I will try to make it ziPc. I'll create the location just after the ʻIHDR` chunk. The size of the impersonated ZIP file must be less than 2GB, as the entire ZIP file must be stored in the chunk.

Also, the reason why the "ZIP container chunk" is placed immediately after the ʻIHDR chunk is because the ʻIHDR chunk has a fixed length and it is highly likely that the offset of the ZIP file will not shift even after being processed by the PNG editor. is. Of course, it doesn't work if you insert some other chunk between the ʻIHDRchunk and theziPc chunk, or delete the ziPc` chunk.

Impersonation procedure

Finally, organize the impersonation procedure. Suppose you want to generate a camouflage file based on an existing ZIP file and PNG file.

  1. Output the signature of the PNG file
  2. Output the IHDR chunk of an existing PNG file
  3. Output the size of the existing ZIP file (ZIP container chunk "length")
  4. Output the "chunk type" ziPc of the ZIP container chunk
  5. Output before CEN of the existing ZIP file ("chunk data" from here)
  6. Output (repeat) while correcting the CEN of the existing ZIP file
  7. If there is anything between the end of CEN and ʻEOCD` in the existing ZIP file, copy it
  8. Correct and output ʻEOCD` of the existing ZIP file
  9. If there is anything after ʻEOCD` in the existing ZIP file, copy it (so far "chunk data")
  10. Calculate and output CRC32 from "chunk type" to "chunk data" above ("CRC" of ZIP container chunk)
  11. Output everything after the IHDR chunk of the existing PNG file

that's all.

Implementation example

There is no particularly difficult process (the most difficult is CRC32 calculation), so if you can operate the binary file, I think that it can be implemented in any language.

The caveat is that ZIP is little endian and PNG is big endian.

For the time being, I gave an example implemented in java / javascript / python on github.

Recommended Posts

How to disguise a ZIP file as a PNG file
How to create a config file
How to use a file other than .fabricrc as a configuration file
[Python] How to store a csv file as one-dimensional array data
How to read a CSV file with Python 2/3
How to create a JSON file in Python
How to read a file in a different directory
How to display DataFrame as a table in Markdown
How to turn a .py file into an .exe file
How to convert a mel spectrogram back to a wav file
How to call a function
Upload a file to Dropbox
How to hack a terminal
How to use Fujifilm X-T3 as a webcam on Ubuntu 20.04
How to put a hyperlink to "file: // hogehoge" with sphinx-> pdf
How to run a Python file at a Windows 10 command prompt
How to import NoteBook as a module in Jupyter (IPython)
How to import a file anywhere you like in Python
How to print characters as a table with Python's print function
How to use cuML SVC as a Gridsearch CV classifier
How to register a package on PyPI (as of September 2017)
[Python] How to output a pandas table to an excel file
How to import NoteBook as a module in Jupyter (IPython)
How to write a Python class
How to use the zip function
How to put a symbolic link
How to make a slack bot
How to create a Conda package
Write standard output to a file
How to make a crawler --Advanced
How to make a recursive function
How to create a virtual bridge
How to make a deadman's switch
How to create a Dockerfile (basic)
[Blender] How to make a Blender plugin
How to delete a Docker container
How to make a crawler --Basic
How to use python zip function
How to paste a CSV file into an Excel file using Pandas
Create a temporary file with django as a zip file and return it
How to make a command to read the configuration file with pyramid
How to specify a .py file to load at startup in IPython 0.13
How to output the output result of the Linux man command to a file
How to set Jupytext nicely when managing code as a team
How to create a CSV dummy file containing Japanese using Faker
How to deploy a web application on Alibaba Cloud as a freelancer
How to get a job as an engineer from your 30s
[ROS2] How to play a bag file with python format launch
[Python] How to scrape a local html file and output it as CSV using Beautiful Soup
[Python] How to convert db file to csv
How to make Selenium as light as possible
How to create a clone from Github
How to split and save a DataFrame
How to build a sphinx translation environment
How to create a git clone folder
Qiita (1) How to write a code name
How to add a package with PyCharm
[Python] How to make a class iterable
How to draw a graph using Matplotlib
[Python] How to convert a 2D list to a 1D list
How to convert Python to an exe file