Summary of what I learned in Spring Batch

I tried to summarize what I learned in Spring Batch, so I will write an article. Regarding the environment construction of Spring Batch, I made it based on the following article. -Try to create a simple batch service with Spring Boot + Spring Batch

I get angry if I don't have a table for Spring Batch

When I tried to run it in the environment I built when I built the environment with reference to various sites, I was angry that there was no table such as "BATCH_JOB_EXECUTION". It seems that it is necessary to prepare a dedicated table in order to run Spring Batch. See below for the required tables.

-JobRepository Metadata Schema

However, it seems difficult to insert these yourself. Therefore, spring prepares sql for various platforms. If you search for "spring batch schema-〇〇 (platform name) .sql", you will find a hit. I used postgresql so I used "shema-postgresql.sql". I referred to the following.

-[Schema-postgresql.sql](https://github.com/spring-projects/spring-batch/blob/master/spring-batch-core/src/main/resources/org/springframework/batch/core/schema -postgresql.sql) -[Schema-drop-postgresql.sql](https://github.com/spring-projects/spring-batch/blob/master/spring-batch-core/src/main/resources/org/springframework/batch/core /schema-drop-postgresql.sql)

Initialize the table when the application runs

I think that you will have to start the app many times to check the operation of the app, but I think it is difficult to create data each time. Therefore, we will deal with it by initializing the table every time. Spring Boot provides a mechanism to initialize the table when the application is executed. I referred to the following site.

-SpringBoot DB initialization method -Spring Boot + PostgreSQL setting method

It seems that it will be started automatically if you place "schema-〇〇.sql" under "src / main / resources" created when the Spring Boot project is created. I used postgresql, so I did the following:

Describe the following in application.properties

application.properties


spring.datasource.driver-class-name=org.postgresql.Driver
spring.datasource.url=jdbc:postgresql://localhost:5432/testdb
#spring.datasource.username=postgres
#spring.datasource.password=postgres
spring.datasource.initialization-mode=always

It seems that SpringBoot automatically defines the Bean for DataSource, and the engineer only needs to describe the DB settings in application.properties.

Prepare SQL

In addition to what I confirmed in "[I get angry if there is no table for Spring Batch](https://qiita.com/kyabetsuda/items/f011533621cff7f53c63# I get angry if there is no table for springbatch)", it is as follows. SQL is prepared.

schema-all.sql


DROP TABLE IF EXISTS people;

CREATE TABLE people  (
    person_id SERIAL NOT NULL PRIMARY KEY,
    first_name VARCHAR(20),
    last_name VARCHAR(20)
);

-- Autogenerated: do not edit this file
DROP TABLE  IF EXISTS BATCH_STEP_EXECUTION_CONTEXT;
DROP TABLE  IF EXISTS BATCH_JOB_EXECUTION_CONTEXT;
DROP TABLE  IF EXISTS BATCH_STEP_EXECUTION;
DROP TABLE  IF EXISTS BATCH_JOB_EXECUTION_PARAMS;
DROP TABLE  IF EXISTS BATCH_JOB_EXECUTION;
DROP TABLE  IF EXISTS BATCH_JOB_INSTANCE;

DROP SEQUENCE  IF EXISTS BATCH_STEP_EXECUTION_SEQ ;
DROP SEQUENCE  IF EXISTS BATCH_JOB_EXECUTION_SEQ ;
DROP SEQUENCE  IF EXISTS BATCH_JOB_SEQ ;

CREATE TABLE BATCH_JOB_INSTANCE  (
	JOB_INSTANCE_ID BIGINT  NOT NULL PRIMARY KEY ,
	VERSION BIGINT ,
	JOB_NAME VARCHAR(100) NOT NULL,
	JOB_KEY VARCHAR(32) NOT NULL,
	constraint JOB_INST_UN unique (JOB_NAME, JOB_KEY)
) ;

CREATE TABLE BATCH_JOB_EXECUTION  (
	JOB_EXECUTION_ID BIGINT  NOT NULL PRIMARY KEY ,
	VERSION BIGINT  ,
	JOB_INSTANCE_ID BIGINT NOT NULL,
	CREATE_TIME TIMESTAMP NOT NULL,
	START_TIME TIMESTAMP DEFAULT NULL ,
	END_TIME TIMESTAMP DEFAULT NULL ,
	STATUS VARCHAR(10) ,
	EXIT_CODE VARCHAR(2500) ,
	EXIT_MESSAGE VARCHAR(2500) ,
	LAST_UPDATED TIMESTAMP,
	JOB_CONFIGURATION_LOCATION VARCHAR(2500) NULL,
	constraint JOB_INST_EXEC_FK foreign key (JOB_INSTANCE_ID)
	references BATCH_JOB_INSTANCE(JOB_INSTANCE_ID)
) ;

CREATE TABLE BATCH_JOB_EXECUTION_PARAMS  (
	JOB_EXECUTION_ID BIGINT NOT NULL ,
	TYPE_CD VARCHAR(6) NOT NULL ,
	KEY_NAME VARCHAR(100) NOT NULL ,
	STRING_VAL VARCHAR(250) ,
	DATE_VAL TIMESTAMP DEFAULT NULL ,
	LONG_VAL BIGINT ,
	DOUBLE_VAL DOUBLE PRECISION ,
	IDENTIFYING CHAR(1) NOT NULL ,
	constraint JOB_EXEC_PARAMS_FK foreign key (JOB_EXECUTION_ID)
	references BATCH_JOB_EXECUTION(JOB_EXECUTION_ID)
) ;

CREATE TABLE BATCH_STEP_EXECUTION  (
	STEP_EXECUTION_ID BIGINT  NOT NULL PRIMARY KEY ,
	VERSION BIGINT NOT NULL,
	STEP_NAME VARCHAR(100) NOT NULL,
	JOB_EXECUTION_ID BIGINT NOT NULL,
	START_TIME TIMESTAMP NOT NULL ,
	END_TIME TIMESTAMP DEFAULT NULL ,
	STATUS VARCHAR(10) ,
	COMMIT_COUNT BIGINT ,
	READ_COUNT BIGINT ,
	FILTER_COUNT BIGINT ,
	WRITE_COUNT BIGINT ,
	READ_SKIP_COUNT BIGINT ,
	WRITE_SKIP_COUNT BIGINT ,
	PROCESS_SKIP_COUNT BIGINT ,
	ROLLBACK_COUNT BIGINT ,
	EXIT_CODE VARCHAR(2500) ,
	EXIT_MESSAGE VARCHAR(2500) ,
	LAST_UPDATED TIMESTAMP,
	constraint JOB_EXEC_STEP_FK foreign key (JOB_EXECUTION_ID)
	references BATCH_JOB_EXECUTION(JOB_EXECUTION_ID)
) ;

CREATE TABLE BATCH_STEP_EXECUTION_CONTEXT  (
	STEP_EXECUTION_ID BIGINT NOT NULL PRIMARY KEY,
	SHORT_CONTEXT VARCHAR(2500) NOT NULL,
	SERIALIZED_CONTEXT TEXT ,
	constraint STEP_EXEC_CTX_FK foreign key (STEP_EXECUTION_ID)
	references BATCH_STEP_EXECUTION(STEP_EXECUTION_ID)
) ;

CREATE TABLE BATCH_JOB_EXECUTION_CONTEXT  (
	JOB_EXECUTION_ID BIGINT NOT NULL PRIMARY KEY,
	SHORT_CONTEXT VARCHAR(2500) NOT NULL,
	SERIALIZED_CONTEXT TEXT ,
	constraint JOB_EXEC_CTX_FK foreign key (JOB_EXECUTION_ID)
	references BATCH_JOB_EXECUTION(JOB_EXECUTION_ID)
) ;

CREATE SEQUENCE BATCH_STEP_EXECUTION_SEQ MAXVALUE 9223372036854775807 NO CYCLE;
CREATE SEQUENCE BATCH_JOB_EXECUTION_SEQ MAXVALUE 9223372036854775807 NO CYCLE;
CREATE SEQUENCE BATCH_JOB_SEQ MAXVALUE 9223372036854775807 NO CYCLE;

Now, the table required for execution is initialized every time it is executed.

Job loops infinitely

In Spring Batch, processing is basically performed using Reader, Processor, Writer classes, but when I defined these by myself, an infinite loop occurred. I had a lot of trouble, but the following site was helpful

-Spring Batch chunk managed series processing

In Spring Batch, the process seems to loop until ItemReader returns null. Therefore, it seems that ItemReader must be devised and implemented so that null is returned when the processing is completed. It was implemented that way on the following sites.

-Implementation of Spring Batch original ItemReader

The following is implemented with reference to the above site.

PersonItemReader.java


import java.util.List;

import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.NonTransientResourceException;
import org.springframework.batch.item.ParseException;
import org.springframework.batch.item.UnexpectedInputException;
import org.springframework.beans.factory.annotation.Autowired;

public class PersonItemReader implements ItemReader<Person>{

	private List<Person> people = null;
	private int nextIndex;
	private final PersonService service;

	public PersonItemReader(PersonService service) {
		this.service = service;
	}

	@Override
	public Person read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException {
		if (people == null) {
			people = service.selectAll();
			nextIndex = 0;
		}
		Person person = null;
		if (nextIndex < people.size()) {
            person = people.get(nextIndex);
            nextIndex++;
        }
        return person;
	}

}

The defined ItemReader is bean-defined.

BatchConfiguration.java


@Configuration
@EnableBatchProcessing
public class BatchConfiguration {

...(abridgement)...

    @Autowired
    PersonService personService;

    @Bean
    public PersonItemReader reader() {
    	return new PersonItemReader(personService);
    }

...(abridgement)...

}

General implementation method of Reader / Processor / Writer

In Spring Batch, it is common to use Reader, Processor, and Writer, but there are general implementation methods for each. I think it's like a design pattern. Below is the implementation method I have done. First is the Reader class.

PersonItemReader.java


import java.util.ArrayList;
import java.util.List;

import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.NonTransientResourceException;
import org.springframework.batch.item.ParseException;
import org.springframework.batch.item.UnexpectedInputException;

public class PersonItemReader implements ItemReader<List<Person>>{

	private final PersonService service;
	private final PersonCheckService checkService;

	public PersonItemReaderForTest(PersonService service, PersonCheckService checkService) {
		this.service = service;
		this.checkService = checkService;
	}

	@Override
	public List<Person> read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException {
		List<Person> people = service.selectAll();
		List<Person> ret = null;
		for(Person person : people) {
			if(checkService.check(person)) {
				if(ret == null)
					ret = new ArrayList<Person>();
				ret.add(person);
			}
		}
		return ret;
	}

}

What is obtained from DB is passed to Processor in List format. Next is the Processor class.

PersonItemProcessor.java


import java.util.ArrayList;
import java.util.List;

import org.springframework.batch.item.ItemProcessor;

public class PersonItemProcessor implements ItemProcessor<List<Person>, List<Person>> {

	@Override
    public List<Person> process(final List<Person> people) throws Exception {
    	List<Person> transformedPeople = new ArrayList<Person>();
    	for(Person person : people) {
          final String firstName = person.getFirstName().toUpperCase();
          final String lastName = person.getLastName().toUpperCase();
          final Person transformedPerson = new Person(firstName, lastName);
          transformedPeople.add(transformedPerson);
    	}
    	return transformedPeople;
    }

}

The List passed from Reader is processed and a new List is returned. Next is the Writer class.

PersonItemWriter.java


import java.util.List;

import org.springframework.batch.item.ItemWriter;

public class PersonItemWriterForTest implements ItemWriter<Object>{

	PersonService service;

	public PersonItemWriterForTest(PersonService service) {
		this.service = service;
	}

	@Override
	public void write(List<? extends Object> items) throws Exception {
		List<Person> people = (List<Person>) items.get(0);
		for(Person person : people) {
			service.updatePerson(person);
		}

	}

}

In the Writer class, the List passed from the Processor is registered in the DB. But noteworthy is the following code in the Writer class

List<Person> people = (List<Person>) items.get(0);

To get List , get (0) from the parameter List. In other words, List \ <List \ <Person > > is passed as a parameter. I felt a little uncomfortable with this. At first, I thought that List \ <Person > would come as a parameter, and when I implemented it, I noticed that the behavior became strange. I wondered why, and when I searched for it, there was a person who had the same question.

-Spring Batch --Using an ItemWriter with List of Lists

The following is written on the above site.

Typically, the design pattern is:

Reader -> reads something, returns ReadItem Processor -> ingests ReadItem, returns ProcessedItem Writer -> ingests List<ProcessedItem>

If your processor is returning List<Object>, then you need your Writer to expect List<List<Object>>.

Typically, the design pattern is Reader, Processor returns a single item, and Writer processes a List of items. Since batch processing is performed, I thought that it would be normal to pass multiple data acquired from the DB to the Processor as it is in the form of List. However, it seems that Writer can receive the object returned by the Processor as a single object stored in the List. Therefore, by returning a single object with Processor, it is not necessary to perform the above processing of items.get (0). Reader also has an implementation method that returns a single object as introduced in "[Job loops infinitely](https://qiita.com/kyabetsuda/items/f011533621cff7f53c63#Job loops infinitely)". Sounds like a general one.

Test jobs and steps

Spring Batch provides a mechanism for testing jobs and steps. Use JobLauncherTestUtils to test. I referred to the following site.

Unit Testing

First, define a bean to use JobLauncherTestUtils.

BatchConfigurationForTest.java


import javax.sql.DataSource;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.batch.item.database.BeanPropertyItemSqlParameterSourceProvider;
import org.springframework.batch.item.database.JdbcBatchItemWriter;
import org.springframework.batch.item.database.builder.JdbcBatchItemWriterBuilder;
import org.springframework.batch.test.JobLauncherTestUtils;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.jdbc.datasource.DriverManagerDataSource;

@Configuration
@EnableBatchProcessing
public class BatchConfigurationForTest {

    @Autowired
    public JobBuilderFactory jobBuilderFactory;

    @Autowired
    public StepBuilderFactory stepBuilderFactory;

    @Autowired
    JobLauncher jobLauncher;

    @Autowired
    JobRepository jobRepository;

    @Bean
    public DataSource dataSource() {
        DriverManagerDataSource dataSource = new DriverManagerDataSource();
        dataSource.setDriverClassName("org.postgresql.Driver");
        dataSource.setUrl("jdbc:postgresql://localhost:5432/ec");
//        dataSource.setUsername(username);
//        dataSource.setPassword(password);
        return dataSource;
    }

    @Bean
    PersonService personService() {
    	return new PersonService();
    };

    @Bean
    public PersonItemReader reader() {
    	return new PersonItemReader(personService());
    }

    @Bean
    public PersonItemProcessor processor() {
        return new PersonItemProcessor();
    }

    @Bean
    public JdbcBatchItemWriter<Person> writer(DataSource dataSource) {
        return new JdbcBatchItemWriterBuilder<Person>()
            .itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
            .sql("INSERT INTO people (first_name, last_name) VALUES (:firstName, :lastName)")
            .dataSource(dataSource)
            .build();
    }

    @Bean
    public NoWorkFoundStepExecutionListener noWorkFoundStepExecutionListener() {
    	return new NoWorkFoundStepExecutionListener();
    }

    @Bean
    public Job importUserJob(Step step1) {
        return jobBuilderFactory.get("importUserJob")
            .incrementer(new RunIdIncrementer())
            .flow(step1)
            .end()
            .build();
    }

    @Bean
    public Step step1(NoWorkFoundStepExecutionListener listener, JdbcBatchItemWriter<Person> writer) {
        return stepBuilderFactory.get("step1")
            .<Person, Person> chunk(1)
            .reader(reader())
            .processor(processor())
            .writer(writer)
            .listener(listener)
            .build();
    }

    @Bean
    public JobLauncherTestUtils jobLauncherTestUtils() {
    	JobLauncherTestUtils utils = new JobLauncherTestUtils();
    	utils.setJob(importUserJob(step1(noWorkFoundStepExecutionListener(), writer(dataSource()))));
    	utils.setJobLauncher(jobLauncher);
    	utils.setJobRepository(jobRepository);
    	return utils;
    }
}

JobLauncherTestUtils is defined as Bean at the bottom. The above is a new definition of the Configuration class for testing. The content itself does not change much from the Configuration class. In Spring Batch, you can set something called "listener" after the step and do something, but since you can also test the listener, Bean is defined together (lister class is described in the above reference site) It has been). Next is the test class. First is the class that tests the job.

JobTest.java


import static org.hamcrest.CoreMatchers.*;

import org.junit.Assert;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.test.JobLauncherTestUtils;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.test.context.ContextConfiguration;
import org.springframework.test.context.junit4.SpringJUnit4ClassRunner;

@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(classes=BatchConfigurationForTest.class)
public class JobTest {

	@Autowired
    private JobLauncherTestUtils jobLauncherTestUtils;

	@Test
	public void testJob() throws Exception {
		JobExecution jobExecution = jobLauncherTestUtils.launchJob();
        Assert.assertThat("COMPLETED", is(jobExecution.getExitStatus().getExitCode()));
	}

}

In the test class, the Bean-defined JobLauncherTestUtils is Autowired. The job is executed by launchJob. Which job to execute is specified when the bean is defined.

Next is the class that tests the steps.

StepTest.java


import static org.hamcrest.CoreMatchers.*;
import static org.junit.Assert.*;

import org.junit.Assert;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.batch.core.ExitStatus;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.StepExecution;
import org.springframework.batch.test.JobLauncherTestUtils;
import org.springframework.batch.test.MetaDataInstanceFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.test.context.ContextConfiguration;
import org.springframework.test.context.junit4.SpringJUnit4ClassRunner;

@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(classes=BatchConfigurationForTest.class)
public class StepTest {

	@Autowired
    private JobLauncherTestUtils jobLauncherTestUtils;

	@Autowired
	NoWorkFoundStepExecutionListener tested;

	@Test
	public void testStep() {
		JobExecution jobExecution = jobLauncherTestUtils.launchStep("step1");
		Assert.assertThat("COMPLETED", is(jobExecution.getExitStatus().getExitCode()));
	}

	@Test
	public void testAfterStep() {
	    StepExecution stepExecution = MetaDataInstanceFactory.createStepExecution();

	    stepExecution.setExitStatus(ExitStatus.COMPLETED);
	    stepExecution.setReadCount(0);

	    ExitStatus exitStatus = tested.afterStep(stepExecution);
	    assertThat(ExitStatus.FAILED.getExitCode(), is(exitStatus.getExitCode()));
	}
}

When testing a step, pass the name of the step to be executed in the argument of launchStep (). testAfterStep () tests the bean-defined listener. setReadCount () represents the number of items read by the Reader class. The NoWorkFoundStepExecutionListener described on the reference site is implemented to return ExisStatus.FAILED when getReadCount () == 0.

This is the end of Spring Batch summary.

Recommended Posts

Summary of what I learned in Spring Batch
Summary of what I learned about Spring Boot
What i learned
What I got into @Transactional in Spring
What I learned ② ~ Mock ~
I tried Spring Batch
What I learned ① ~ DJUnit ~
What I learned in Java (Part 2) What are variables?
[Rilas] What I learned in implementing the pagination function.
Loop step in Spring Batch
What I learned about Kotlin
What I learned in Java (Part 4) Conditional branching and repetition
What I learned from studying Rails
Spring Framework 5.0 Summary of major changes
I participated in JJUG CCC 2019 Spring
What is @Autowired in Spring boot?
What I learned with Java Gold
What I learned with Java Silver
Summary of "Design Patterns Learned in Java Language (Multithread Edition)" (Part 7)
Summary of "Design Patterns Learned in Java Language (Multithread Edition)" (Part 3)
Summary of "Design Patterns Learned in Java Language (Multithread Edition)" (Part 9)
What I have learned in Java (Part 1) Java development flow and overview
Summary of "Design Patterns Learned in Java Language (Multithread Edition)" (Part 4)
Summary of "Design Patterns Learned in Java Language (Multithread Edition)" (Part 5)
Summary of problems that I could not log in to firebase
[Note] What I learned in half a year from inexperienced (Java)
[Note] What I learned in half a year from inexperienced (Java) (1)
Summary of "Design Patterns Learned in Java Language (Multithread Edition)" (Part 2)
Summary of "Design Patterns Learned in Java Language (Multithread Edition)" (Part 1)
What I did in the migration from Spring Boot 1.4 series to 2.0 series
[Note] What I learned in half a year from inexperienced (Java) (3)
What I did in the migration from Spring Boot 1.5 series to 2.0 series
Summary of "Design Patterns Learned in Java Language (Multithread Edition)" (Part 11)
Summary of "Design Patterns Learned in Java Language (Multithread Edition)" (Part 12)
Summary of "Design Patterns Learned in Java Language (Multithread Edition)" (Part 8)
[Java] I participated in ABC-188 of Atcorder.
What I investigated in Wagby development Note 1
What I learned through the swift teacher
Summary of root classes in various languages
Summary of hashes and symbols in Ruby
Summary of going to JJUG CCC 2019 Spring
What I learned from Java monetary calculation
[* Java *] I participated in JJUG CCC 2019 Spring
Summary of how to select elements in Selenium
[Rails Struggle/Rails Tutorial] What you learned in Rails Tutorial Chapter 6
[Rails Struggle/Rails Tutorial] What you learned in Rails Tutorial Chapter 3
Summary of new features added in Deeplearning4J 1.0.0-beta4
A quick review of Java learned in class
I need validation of Spring Data for Pageable ~
What an inexperienced engineer who took a leave of absence from college learned in 2020
I tried Spring.
First AWS Lambda (I tried to see what kind of environment it works in)
Summary of frequently used commands in Rails and Docker
A quick review of Java learned in class part4
[For beginners] DI ~ The basics of DI and DI in Spring ~
05. I tried to stub the source of Spring Boot
I tried to reduce the capacity of Spring Boot
I will write what I learned about docker anyway (second)
Autoboxing that I learned as NullPointerException in Short comparison
I made a Restful server and client in Spring.
Personal summary of the guys often used in JUnit 4