This is a memo that made a Java program that converts variable length data (with line feed code) to variable length data with length field (LL) (without line feed code).
I mainly develop Java (Spring Boot) web applications.
The source code and input files can be found on GitHub.
――I was in charge of data migration tool development for a legacy migration project that I was involved in in my work. --Migration source is IMS (hierarchical DB) segment data --The migration destination is a PostgreSQL bytea type column ――It was difficult to create test data with a text editor, so I created a program.
--Variable length binary data file with length field (LL) 2 bytes. --The contents of the binary data can be anything. --Strictly speaking, the data coming from the source IMS is a binary representation of the character code EBCDIC, but the contents of the data are not text-processed within the migration tool. ――So, I am not particular about the contents of the data for testing the migration tool. I will use the UTF-8 text file, which is easy to handle in Java text input processing, as the source material.
--Prepare a UTF-8 variable length text file as the source material.
EditBinary.java
//Abbreviation
public static void main(String[] args) {
// (1)
BufferedReader br = null;
BufferedOutputStream bos = null;
int LLSIZE = 2;
try {
br = new BufferedReader(new FileReader("input.txt"));
bos = new BufferedOutputStream(new FileOutputStream("output.txt"));
String line = null;
// (2)
while ((line = br.readLine()) != null) {
int recSize = line.length();
// (3)
byte ll[] = ByteBuffer.allocate(LLSIZE).putShort((short)recSize).array();
// (4)
byte data[] = ByteBuffer.allocate(recSize).put(line.getBytes("UTF-8")).array();
// (5)
ByteBuffer lineBuf = ByteBuffer.allocate(LLSIZE + recSize);
lineBuf.put(ll);
lineBuf.put(data);
// (6)
bos.write(lineBuf.array());
}
bos.flush();
//Abbreviation
-(1) Input stream uses BufferedReader and output stream uses BufferedOutputStream. --Process the input file as text data and the output file as binary data. -(2) Read one record at a time with BufferedReader (readLine). -(3) The record size read by readLine is stored in the length field LL as byte data. --LL will not include its own size (2 bytes) --recSize is an int type (4 bytes) and does not fit in LL (2 bytes), so cast it to a short type (2 bytes). -(4) Holds the record data read by readLine as UTF-8 byte data. -(5) Reserve an area with ByteBuffer to combine LL and record data into one record, and set LL and record data. -(6) Output the binary array of LL and record data to a file.
Compiling and running is simple.
--Compile
javac EditBinary.java
--Execution
java EditBinary
Let's take a look at the contents of output.txt with "Hex Fiend". The line feed code at the end of each record has been removed and replaced with an LL at the beginning. You can see that the value of LL is as follows. The number in parentheses is the number of bytes in decimal, that is, the record length.
--LL of the first record: 0x0064 (100) ――Second record LL: 0x00C8 (200) --Third record LL: 0x012C (300) ―― 4th record LL: 0x0190 (400) ―― 5th record LL: 0x01F4 (500)
So, I was able to create variable length binary data with LL.
――I don't think there are many cases where a line feed code is used as a delimiter for binary data (not used in this article), but if you do, you need to be careful because the binary representation may change depending on the environment. It is a story.
--The line feed code is CRLF on Win and LF on Unix / Linux. --The binary (hexagonal) representation of CRLF is (0x0D 0x0A) --The binary (hexary) representation of LF is (0x0A) ――It seems that it is 2 bytes or 4 bytes depending on the character code.
--Input message format and content -Line feed code
Recommended Posts