[Kotlin] ZIP-compress Japanese files

Introduction

Used during ZIP compression, that is, with ZipOutputStream The default value of the resulting Charset is UTF-8.

Since the Japanese file name of Windows is MS932, the characters will be garbled when decompressed with Windows as it is. Therefore, specify Charset in ZipOutputStream to avoid garbled characters. There are many similar articles in Java, but few in Kotlin, so I'll drop them.

I'm not so familiar with Kotlin (Java) that I don't know how to use FileInputStream, so I'd appreciate it if you could teach me a better way to write it.

What you can see in this article

How to ZIP compression (Japanese support) using Kotlin.

Articles that I used as a reference

-Safely compress and decompress zip with Kotlin --Java ZIP compression method Directory specification and file specification --ZipEntry / ZipOutputStream -Java file copy (change buffer size) -Let's sort out the differences between Shift_JIS and Windows-31J (MS932) -Be careful if you find SHIFT-JIS in Java

code

import java.io.BufferedOutputStream
import java.io.File
import java.io.FileInputStream
import java.io.FileOutputStream
import java.nio.charset.Charset
import java.util.zip.ZipEntry
import java.util.zip.ZipOutputStream

/**
 *ZIP archives the files in the specified ArrayList and creates them in the specified path.
 *The default character code is Shift_Since it is JIS, Japanese file names can also be supported.
 *
 * @param fromFiles List of files to compress.(Example; {C:/sample1.txt, C:/sample2.txt} )
 * @param toZipFile Specify the compressed file name with the full path.(Example: C:/sample.zip )
 * @param charset Charset to use when converting to Zip. * 1
 */
fun zipFiles(fromFiles: List<File>, toZipFile: File, charset: Charset = Charset.forName("MS932")) {
    try {
        //If you use use, the Resource will be closed automatically when the scope ends.
        //In the first argument of ZipOutputStream, specify the file to be zipped.
        //If the second argument of ZipOutputStream is empty, Charset is UTF-It will be 8.
        ZipOutputStream(BufferedOutputStream(FileOutputStream(toZipFile)), charset).use { zipOutputStream ->
            fromFiles.forEach { file ->
                if (file.isFile) {
                    archiveFile(zipOutputStream, file, file.name)
                } else if (file.isDirectory) {
                    archiveDirectory(zipOutputStream, file, file)
                }
            }
        }
    } catch (e: Exception) {
        //Error handling.
    }
}

/**
 *ZIP archives the files in the specified ArrayList and creates them in the specified path.
 *The default character code is Shift_Since it is JIS, Japanese file names can also be supported.
 *
 * @param zipOutputStream Stream to write targetFile.
 * @param baseFile The root file of the directory to compress.
 * @param targetFile The directory to compress.
 */
fun archiveDirectory(zipOutputStream: ZipOutputStream, baseFile: File, targetFile: File) {
    targetFile.listFiles()?.forEach { file ->
        if (file.isDirectory) {
            archiveDirectory(zipOutputStream, baseFile, file)
        } else if (file.isFile) {
            //Since entryName must be specified as a relative path, get the relative path to baseFile.
            val relativeEntryName = file.toRelativeString(baseFile.parentFile)
            archiveFile(zipOutputStream, file, relativeEntryName)
        } else {
            //If the file does not exist, it is neither isDirectory nor isFile.
            //If you want to process in that case, write the process here.
        }
    }
}

/**
 *ZIP archives the files in the specified ArrayList and creates them in the specified path.
 *The default character code is Shift_Since it is JIS, Japanese file names can also be supported.
 *
 * @param zipOutputStream Stream to write targetFile.
 * @param targetFile File to compress.
 * @param entryName File name when writing targetFile to zipOutputStream(Relative path)。
 *          ex) hoge/fuga.If it is txt, fizz.zip/hoge/fuga.A file will be created in txt.
 */
fun archiveFile(zipOutputStream: ZipOutputStream, targetFile: File, entryName: String) {
    // ※2
    val zipBufferSize = 100 * 1024
    // ${Path of File specified by ZipOutputStream} / ${entryName}The file will be created in.
    zipOutputStream.putNextEntry(ZipEntry(entryName))
    FileInputStream(targetFile).use { inputStream ->
        val bytes = ByteArray(zipBufferSize)
        var length: Int
        //Read the file for zipBufferSize. After loading-Returns 1.
        while (inputStream.read(bytes).also { length = it } != -1) {
            zipOutputStream.write(bytes, 0, length)
        }
    }
    //The Entry is not closed with use and must be closed manually.
    zipOutputStream.closeEntry()
}

Note

* 1 About the Charset to be used

This time, we have selected MS932 as the Charset to use. The reason for using MS932 instead of SHIFT_JIS, which is a similar character code, is based on the following article.

-Let's sort out the differences between Shift_JIS and Windows-31J (MS932) -Be careful if you find SHIFT-JIS in Java

* 2 About the buffer size used by FileInputStream

Reference: Java file copy (change buffer size) バッファサイズ

This time, the estimated size of the uploaded file was about 1MB to 5MB, so I specified 100KB.

About net.lingala.zip4j.ZipFile

Net.lingala.zip4j.ZipFile with many useful functions is Charset's It cannot be changed. Therefore, if you want to support Japanese, use java.util.zip.

Recommended Posts

[Kotlin] ZIP-compress Japanese files
[Kotlin] Delete files with duplicate contents [Java]