PDF files are always used to carry a lot of great information content. To make better use of this information, you need to use some tools to extract text and image information from the PDF. Below are the texts and photos to extract PDF through Java.
-Free Spire Pdf for JAVA 2.4.4 (free version)
--Method 1: After downloading the stress of Free Spire.Pdf for Java from the official site, add it to Shift + Ctrl + Alt + S in IDEA or Eclipse. By adding the Spire.Pdf.jar packet to the program, the jar file Can be obtained in the lib folder under the decompression path. The result of introducing the jar package is as follows:
--Method 2: Install from maven library. Refer to the installation method (https://www.e-iceblue.com/Tutorials/Licensing/How-to-install-Spire.PDF-for-Java-from-Maven-Repository.html).
The test source documentation is as follows:
** Step 1: ** Add namespace;
import com.spire.pdf.*;
import java.io.FileWriter;
** Step 2: ** Create an instance of PDF and load the PDF source file;
//Create the PDF
PdfDocument doc = new PdfDocument();
//Load the PDF file
doc.loadFromFile("data/Sample.pdf");
** Step 3: ** Define an example of a character buffer that traverses the entire PDF document using the StringBuider method;
// Traverse the PDF
StringBuilder buffer = new StringBuilder();
for(int i = 1; i<doc.getPages().getCount(); i++){
PdfPageBase page = doc.getPages().get(i);
buffer.append(page.extractText());
}
** Step 4: ** Define an instance of one writer to write data to the buffer area and use write () to write the data in the buffer area to a text.txt file and save it.
//save text
String fileName = "output/text.txt";
FileWriter writer = new FileWriter(fileName);
writer.write(buffer.toString());
writer.flush();
writer.close();
Text extraction result:
** Step 1: ** Add namespace;
import com.spire.pdf.*;
import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.File;
** Step 2: ** Create an instance of PDF and load the PDF source file;
//Create the PDF
PdfDocument pdf = new PdfDocument();
//Load the PDF file
pdf.loadFromFile("data/Sample.pdf");
** Step 3: ** The for loop goes through each page of the PDF, gets the image of the specified page using the extractImages () method, and finally saves the image in PNG format.
// Declare an int variable
int index = 0;
// loop through the pages
for (int i= 0;i< pdf.getPages().getCount(); i ++){
//Get the PDF pages
PdfPageBase page = pdf.getPages().get(i);
// Extract images from a particular page
for (BufferedImage image : page.extractImages()) {
//specify the file path and name
File output = new File("output/" + String.format("Image_%d.png ", index++));
//Save image as .png file
ImageIO.write(image, "PNG", output);
}
}
Image extraction result:
Recommended Posts