I needed to convert a large XLSX file to CSV, so I used Apache POI to handle it. I will leave a note of what I was careful about. The sample project is [here]
Java:1.8 POI:3.17
There is no support for cell formats or formats. It does not support xls format (2003 format) (officially there was a sample source that supports xls)
Use the API that says Streaming (ala SAX) in the feature overview of the Official Site (https://poi.apache.org/spreadsheet/). It doesn't seem to consume much memory because it is read by SAX (Simple API for XML).
You can convert XLSX to CSV by copying and pasting the following sources and executing the convert method.
XLSX to CSV conversion sample
/**
*Convert XLSX file to CSV file<br>
*Even if there are multiple sheets in the XLSX file, only the first sheet is targeted.
*
* @param fromXlsxPath Path of XLSX file to be converted
* @param toCsvPath Path of CSV file to be converted (overwrite if existing path is specified)
*/
public static void convert(Path fromXlsxPath, Path toCsvPath) {
System.out.println("Start XlsxToCsvUtil#convert");
try (OPCPackage pkg = OPCPackage.open(fromXlsxPath.toAbsolutePath().toString(), PackageAccess.READ);
BufferedWriter bw = Files.newBufferedWriter(toCsvPath, StandardCharsets.UTF_8)) {
ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(pkg);
XSSFReader xssfReader = new XSSFReader(pkg);
StylesTable styleTable = xssfReader.getStylesTable();
try (InputStream is = xssfReader.getSheetsData().next()) {
//Handler generation for parsing
ContentHandler handler = new XSSFSheetXMLHandler(
styleTable, null, strings, new XlsxRowHandler(bw), new DataFormatter(), false);
//XML reader generation for parsing
XMLReader sheetParser = SAXHelper.newXMLReader();
sheetParser.setContentHandler(handler);
System.out.println("Start perspective");
sheetParser.parse(new InputSource(is));
System.out.println("End of perspective");
}
} catch (InvalidOperationException | IOException | SAXException | OpenXML4JException
| ParserConfigurationException e) {
System.out.println("Error XlsxToCsvUtil#convert");
}
System.out.println("Finished XlsxToCsvUtil#convert");
}
/**
*
*/
public static class XlsxRowHandler implements SheetContentsHandler {
private final List<String> row = new ArrayList<>();
private final BufferedWriter bw;
public XlsxRowHandler(BufferedWriter bw) throws IOException {
this.bw = bw;
}
@Override
public void startRow(int rowNum) {
System.out.println(rowNum + "Start reading line");
row.clear();
}
@Override
public void cell(String cellReference, String formattedValue, XSSFComment comment) {
row.add(formattedValue);
}
@Override
public void endRow(int rowNum) {
try {
bw.write(String.join(",", row.stream().map(c -> "\"" + c + "\"").collect(Collectors.toList())));
bw.newLine();
} catch (IOException e) {
}
}
@Override
public void headerFooter(String text, boolean isHeader, String tagName) {
}
}
The following classes are prepared by POI and the samples are almost the same. I don't understand the details.
The important thing is "XlsxRowHandler". This is a class that implements the interface prepared by POI. This class reads the contents of the cell and writes it out with BufferedWriter to output CSV. As you can see by looking at the method name, the method is executed at the following timing.
The SheetContentsHandler # cell method is not called on blank cells. When dealing with XLSX files with blank cells, it is necessary to implement accordingly.
Official site On the here page, "[XLSX2CSV](https://svn.apache.org/repos/asf/poi/trunk/" src / examples / src / org / apache / poi / xssf / eventusermodel / XLSX2CSV.java) "is the official sample.
Recommended Posts