Compare PDF output in Java for snapshot testing

When developing an application that outputs PDFs such as forms, have you ever wanted to automatically test the final output PDF including the layout? In this document, I will show you how to perform regression testing including layout by imaging and comparing two PDF files.

Basic idea

Assuming you already have a PDF file with the expected output, test if the PDF output by the program matches this snapshot
You don't need to be able to compare two completely unknown PDFs well because you just want to detect degreasing, you only need to be able to compare well if the number of pages and page size are the same.
PDF is imaged page by page and compared pixel by pixel, and if they do not match, an image showing the difference is output.
The idea is the same as the previously introduced "Use PDF image comparison for refactoring".

Confirmation environment

PDFBox 2.0.15
JUnit 5.4.2
AdoptOpenJDK 11.0.3+7
macOS 10.14.5

Until PDF imaging

Imaging PDFs is easier than you might think with Apache PDFBox (https://pdfbox.apache.org/). As mentioned earlier, the purpose is an automatic regression test, so if the number of pages or page size is different, the test will fail and give up.

The larger the DPI for imaging, the more precise comparisons can be made using high-resolution images, but machine resources (CPU, memory) are required accordingly.

static void assertPdfEquals(InputStream expected, InputStream actual) throws IOException {
    try (PDDocument doc1 = PDDocument.load(expected);
         PDDocument doc2 = PDDocument.load(actual)) {
        //Test failed if the number of pages is different
        assertEquals(doc1.getNumberOfPages(), doc2.getNumberOfPages());

        PDFRenderer renderer1 = new PDFRenderer(doc1);
        PDFRenderer renderer2 = new PDFRenderer(doc2);
        for (int i = 0; i < doc1.getNumberOfPages(); i++) {
            BufferedImage image1 = renderer1.renderImageWithDPI(i, 144, ImageType.RGB);
            BufferedImage image2 = renderer2.renderImageWithDPI(i, 144, ImageType.RGB);

            //Test fails even if the size is different
            assertEquals(image1.getWidth(), image2.getWidth());
            assertEquals(image1.getHeight(), image2.getHeight());

            //Test image match and output diff image to temporary file if they do not match
            Path path = Files.createTempFile("diff-" + i + "-", ".png ");
            try (OutputStream os = Files.newOutputStream(path)) {
                assertTrue(compareImage(image1, image2, os), path);
            }
        }
    }
}

Until image comparison

Comparing images is not particularly difficult as long as you only check the exact match of RGB values pixel by pixel. The point is not just to compare, but to repaint the mismatched pixels with a highlight color to create a diff image.

static boolean compareImage(BufferedImage image1, BufferedImage image2, OutputStream os) throws IOException {
    boolean matched = true;
    for (int x = 0; x < image1.getWidth(); x++) {
        for (int y = 0; y < image1.getHeight(); y++) {
            int p1 = image1.getRGB(x, y);
            int p2 = image2.getRGB(x, y);
            //Pixels that match are left as they are, and pixels that do not match are changed to magenta.
            if (p1 != p2) {
                matched = false;
                image1.setRGB(x, y, Color.MAGENTA.getRGB());
            }
        }
    }
    //Output the difference image
    if (os != null) {
        ImageIO.write(image1, "png", os);
    }
    return matched;
}

Difference image output example

In contrast to the expected PDF, in the actual PDF, "the date of the heading has been added", "item 3 of the item has been deleted", and "subtotals and totals have changed due to the deletion of item 3". Can be read by comparing it with the difference image.

Expected PDF
Actual PDF
Difference image

Supplement

If you don't need a diff image, it's faster to return early the moment you find a pixel that doesn't match.
If you want to allow some discrepancies, you could have an API that returns the pixel match ratio instead of the simple true / false.
You can change the difference image by devising the color that repaints the mismatched pixels. For example, you can make the match black and make only the mismatch stand out in white.

    //Matched pixels are black, unmatched pixels are white
    if (p1 == p2) {
        image1.setRGB(x, y, Color.BLACK.getRGB());
    } else {
        matched = false;
        image1.setRGB(x, y, Color.WHITE.getRGB());
    }

Summary

I introduced how to perform a snapshot test by converting the PDF output to an image. There is a limit to how humans can visually perform regression testing when trying to confirm that "the data is not just output, but is displayed in the correct layout" like PDF.

If you use the method introduced this time, you will not only detect the degreasing of the application you implement, but you will also be aware of unintended layout changes when you upgrade the PDF output library, so you can develop with more peace of mind. There is none.

References

Introducing PDFUtil – Compare two PDF files textually or Visually