Item 48: Use caution when making streams parallel

48. Be careful when performing stream processing in parallel

Parallel processing is still difficult

With the evolution of Java, parallel programming has become easier to write, but it is still difficult to write a correct and fast implementation of parallel processing. This is no exception to the processing of streams in parallel streams. Let's look at the example of Item45.

package tryAny.effectiveJava;

import static java.math.BigInteger.*;

import java.math.BigInteger;
import java.util.stream.Stream;

public class MersennePrimes {
    public static void main(String[] args) {
        primes().map(p -> TWO.pow(p.intValueExact()).subtract(ONE)).filter(mersenne -> mersenne.isProbablePrime(50))
                .limit(20)
                // .forEach(System.out::println);
                .forEach(mp -> System.out.println(mp.bitLength() + ":" + mp));
    }

    static Stream<BigInteger> primes() {
        return Stream.iterate(TWO, BigInteger::nextProbablePrime);
    }
}

Even if parallel () is included in the stream processing of this program, the processing will not be completed and the CPU will continue to stay high, instead of speeding up the processing.

What's happening here? It's simply because the library of streams didn't know how to parallelize this pipeline and the heuristic solution failed. ** If it originally comes from Stream.iterate, or if limit intermediate operations are performed, pipeline parallelization is difficult to improve performance. ** The above program has both of these elements. The lesson here is that ** don't use parallel streams indiscriminately **.

In general, ** parallelization can improve performance for streams with ArrayList, HashMap, HashSet, ConcurrentHashMap, arrays, int ranges, and long ranges. ** ** What they have in common is that they are easy to divide into subranges. Another important thing these data structures have in common is ** locality of reference when processed sequentially. % 85% A7% E3% 81% AE% E5% B1% 80% E6% 89% 80% E6% 80% A7) **.

Termination operations also affect the performance of parallel processing. If a large amount of processing is performed by termination processing and the termination processing is internally sequential processing compared to the entire pipeline, parallelization of the pipeline is not very effective. The most effective termination processing is reduction processing such as min, max, count, and sum. Also, [Short-circuit evaluation] such as anyMatch, allMatch, noneMatch (https://ja.wikipedia.org/wiki/%E7%9F%AD%E7%B5%A1%E8%A9%95%E4%BE%A1) Is easy to obtain the effect of parallelization. Variable reduction operations performed by the stream's collect method are less likely to benefit from parallelization. This is because the overhead of the process of connecting collections is costly.

Safety failure ** Stream parallelization not only causes performance problems, including liveness failure, but can also lead to incorrect results and unexpected behavior. (Safety failure) ** Safety failure occurs when you use a function that does not comply with the strict specifications of the stream. For example, the function that accumulates and the function that combines to the stream is [combined](https://docs.oracle.com/javase/jp/8/docs/api/java/util/stream/ In package-summary.html # Associativity), in Non-interference , Stateless must be a function. If you don't follow this, straight pipelines will not cause any problems, but parallelized pipelines can have disastrous consequences.

Is it effective to parallelize?

Even if parallel processing can be performed under very good conditions, it is meaningless unless it shows performance that offsets the cost of parallelization. A rough estimate should satisfy (number of elements in the stream) * (number of lines of code executed per element)> 100000 (like 10000 when looking at the link source .. (http://gee.cs.). oswego.edu / dl / html / StreamParallelGuidance.html)).

It should be recognized that stream parallelization is a performance optimization. You have to make sure that any optimization is worth testing before and after the change. Ideally, testing should be done in a production environment.

Parallelization is very useful when used under the right circumstances

** If parallelization is performed under the correct circumstances, performance improvement can be expected to be proportional to the number of processors. ** ** In the fields of machine learning and data processing, it is easy to improve these performances.

[Prime number counting function] that can be parallelized efficiently (https://ja.wikipedia.org/wiki/%E7%B4%A0%E6%95%B0%E8%A8%88%E6%95%B0 Let's look at an example of% E9% 96% A2% E6% 95% B0).

package tryAny.effectiveJava;

import java.math.BigInteger;
import java.util.stream.LongStream;

public class ParallelTest1 {
    // Prime-counting stream pipeline - benefits from parallelization
    static long pi(long n) {
        return LongStream.rangeClosed(2, n).mapToObj(BigInteger::valueOf).filter(i -> i.isProbablePrime(50)).count();
    }

    public static void main(String[] args) {
        StopWatch sw = new StopWatch();
        sw.start();
        System.out.println(pi(10000000));
        sw.stop();
        System.out.println(sw.getTime());
    }
}

It took about 42 seconds to process the above code. Parallelize this.

package tryAny.effectiveJava;

import java.math.BigInteger;
import java.util.stream.LongStream;

import org.apache.commons.lang3.time.StopWatch;

public class ParallelTest1 {
    // Prime-counting stream pipeline - benefits from parallelization
    static long pi(long n) {
        return LongStream.rangeClosed(2, n).parallel().mapToObj(BigInteger::valueOf).filter(i -> i.isProbablePrime(50))
                .count();
    }

    public static void main(String[] args) {
        StopWatch sw = new StopWatch();
        sw.start();
        System.out.println(pi(10000000));
        sw.stop();
        System.out.println(sw.getTime());
    }
}

This finished in about 23 seconds. (Runs on a 2-core machine)

When parallelizing a stream of random values

If you want to generate random values in parallel, you should use SplittableRandom rather than ThreadLocalRandom.

Recommended Posts

Item 48: Use caution when making streams parallel
Item 45: Use streams judiciously
Streams and parallel streams
Item 52: Use overloading judiciously
Item 53: Use varargs judiciously