Avoid database access from loops as much as possible

As the title says.

Like this


//Stop this
List<Person> people = new ArrayList<>();
for (int id : ids) {
    Person person = repository.findOne(p);
    people.add(person);
}

//Let's do this
List<Person> people = repository.findAll(ids);

It's a matter of course, but I feel that it's a code that is common to beginners. In my narrow observation range, I already have SQL to get one case, so I should use this, or because the processing logic is simpler to get one case at a time, etc. ** [Move] * When it is made with *, it is not so conscious and it may lead to performance problems later.

Let's measure the speed with a simple sample code. The sample code is Spring Boot 2.0 + Spring Data JPA.

Loop 10,000 times, access the database from the loop to get 10,000 records, and use the IN clause to prepare and call the process to get 10,000 records at once. Since it was troublesome to make a screen, I made it RestController and output it in JSON appropriately.

Verification code

Controller


package com.example.web;

import com.example.domain.service.BenchmarkService;
import com.example.domain.service.BenchmarkServiceImpl;
import lombok.RequiredArgsConstructor;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import java.util.HashMap;
import java.util.Map;

@RestController
@RequestMapping("/benchmark")
@RequiredArgsConstructor
public class SandboxRestController {

    private final BenchmarkService benchmarkService;

    @RequestMapping("/sql/loop")
    public ResponseEntity<String> benchmarkDatabaseAccess() {
        //10 at a time using IN clause,Get 000 records
        long beforeOneTime = System.currentTimeMillis();
        benchmarkService.oneTimesDatabaseAccess();
        long afterOneTime = System.currentTimeMillis();

        // 10,Get records one by one in 000 loops
        long beforeLoopTime = System.currentTimeMillis();
        benchmarkService.tenThousandTimesDatabaseAccess();
        long afterLoopTime = System.currentTimeMillis();

        Map<String, Long> result = new HashMap<>();
        result.put("oneTimeFirst", afterOneTime - beforeOneTime);
        result.put("loopTimeFirst", afterLoopTime - beforeLoopTime);

        //Do the same process again
        beforeOneTime = System.currentTimeMillis();
        benchmarkService.oneTimesDatabaseAccess();
        afterOneTime = System.currentTimeMillis();

        beforeLoopTime = System.currentTimeMillis();
        benchmarkService.tenThousandTimesDatabaseAccess();
        afterLoopTime = System.currentTimeMillis();

        result.put("oneTimeSecond", afterOneTime - beforeOneTime);
        result.put("loopTimeSecond", afterLoopTime - beforeLoopTime);
        return ResponseEntity.status(200).body(result.toString());
    }
}

Service


@Service
@RequiredArgsConstructor
public class BenchmarkServiceImpl implements BenchmarkService {

    private final BenchmarkRepository benchmarkRepository;

    public void tenThousandTimesDatabaseAccess() {
        for (int i = 1; i <= 10000; i++) {
            benchmarkRepository.findById(i);
        }
    }

    public void oneTimesDatabaseAccess() {
        List<Integer> ids = IntStream.range(1, 10000).boxed().collect(toList());
        benchmarkRepository.findByIdIn(ids);
    }
}

For the database, use the table prepared easily by Spring Data JPA.

Entity


package com.example.domain.model;

import lombok.Data;

import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.Table;
import javax.validation.constraints.NotNull;

@Entity
@Data
@Table(name = "benchmark")
public class BenchmarkEntity {

    public BenchmarkEntity() {};

    public BenchmarkEntity(int id, int num) {
        this.id = id;
        this.num = num;
    }

    @Id
    @GeneratedValue
    private Integer id;

    @NotNull
    private int num;
}

Repository


package com.example.domain.repository;

import com.example.domain.model.BenchmarkEntity;
import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.data.jpa.repository.Query;
import org.springframework.data.repository.query.Param;
import org.springframework.stereotype.Repository;

import java.util.List;

@Repository
public interface BenchmarkRepository extends JpaRepository<BenchmarkEntity, Integer> {

    // SELECT * FROM benchmark WHERE id = #{id}
    BenchmarkEntity findById(int id);

    // SELECT * FROM benchmark WHERE id IN (#{...ids})
    List<BenchmarkEntity> findByIdIn(List<Integer> ids);
}

result

When I operated it on-memory with h2database,

10,000 loops first 711ms, second 438ms Bulk acquisition is 393ms for the first time and 24ms for the second time

It was that kind of feeling. Even on-memory makes such a difference, so the difference will increase as the lead time per read increases, for example, accessing a database in another environment.

It is easier to process if you acquire one by one as an implementation, and in the case of batch acquisition, the subsequent processing may be a little complicated, or it may take time to run a loop in that processing. In most cases, it should be faster to get them all at once (unless you are issuing extremely poorly performing SQL).

I'm not saying that we should completely eliminate the SQL called from the loop, but let's be careful about the processing that is expected to increase the number of loops under heavy load due to the requirements.

Reference: https://github.com/tnemotox/sandbox

Recommended Posts

Avoid database access from loops as much as possible
Avoid Java Calendar as much as possible
3. Create a database to access from the web module