I tried what I wanted to try with Stream softly.

It's nice to participate in the Advent calendar somehow, but I don't have anything to write, so I'll talk about Stream, which I can write. The code I tried this time is also stored in GitHub.

Readability aspect

This guy added in Java 8. As many of you may already know, when you benchmark and measure the execution time, if the elements to be handled are small, the execution time will be slower than the procedural type. The advantage is that the method name is clear and it is easy to understand what the source wants to do.

Simple benchmark execution environment

I don't want to use tools or benchmark the number of times to 1 million It seems that you can use nanoTime, but I chose currentTimeMillis because I only need to know the difference.

Bench.java


public class Main {
    public static void main(String[] args) {
        System.out.println("Excution time:"+benchMark()+"[sec]");
    }

    private static double benchMark(){
        long start = System.currentTimeMillis();
        HogeSomeTask task = new HogeSomeTask();
        task.something_do();
        long end = System.currentTimeMillis();

        return ((end - start) / 1000.0);
    }
}

OS:Ubuntu 16.04 CPU: Intel Core i7-2700K CPU @ 5.9GHz (Please forgive the old one) JDK:Open-jdk9

Sample code

The following code is the result of benchmarking with the code of the process that outputs the target data of less than 5 characters as a condition. On the contrary, I also tried the process of outputting data of 5 characters or more, but the result was almost the same, so I omitted it.

Procedural.java


    public void use_for(){ 
        List<String>list = Arrays.asList("Java","Ruby","Csharp","Scala","Haskell");
        for(String lang : list){
            if(lang.length() < 5){
                System.out.println(lang);
            }
        }
    } 

The average execution time for 10 times is 0.001 [sec]. Pretty fast

Use Stream

Stream.java


 public void use_stream(){ 
        List<String>list = Arrays.asList("Java","Ruby","Csharp","Scala","Haskell");
        list.stream().filter(lang -> lang.length() < 5).forEach(System.out::println);
 }

The average execution time for 10 times is 0.025 [sec]. It is a little slower than the procedural type.

With Java9, Stream also has new features

It feels more like this now, but let's take a quick look at the method. It can be written like Scala by the take / dropWhile method. takeWhile A method that can process target data (while the conditions are met) simply by specifying the target conditions. Intermediate processing has been reduced.

takeWhileExample.java


   List<String>list = Arrays.asList("Java","Ruby","Csharp","Scala","Haskell")   
         list.stream().takeWhile(lang -> lang.length() < 5).forEach(System.out::println);

dropWhile A method that can output the target data (after the condition is matched) just by specifying the target condition. As with takeWhile, intermediate processing has been reduced.

dropWhileExample.java


  List<String> list = Arrays.asList("Java","Ruby","Csharp","Scala","Haskell");
        list.stream().dropWhile(lang -> lang.length() < 5).forEach(System.out::println);

ofNullable If the target data is not null, return Stream. If null, a method that returns an empty Stream. You can now write Streams directly from Optional as shown below

optional.java


Optional.ofNullable(null).stream().forEach(System.out::println);

The speed you care about

I've tried everything and benchmarked it, so I've summarized it in a table. Regarding ofNullable, it seems that there is a big point to handle null safely, so this time we are verifying the performance of handling data, we do not have time for ad-care, so I will omit it.

1. Processing of the above sample code

Average run time(10 times)
Procedural 0.001[sec]
Stream 0.025[sec]
parallelStream 0.026[sec]
takeWhile 0.026[sec]
takeWhile(Use parallelStream) 0.032[sec]

There is no big difference compared to some methods from Java 8. Intermediate processing seems to be slow

2. Reverse processing of 1

Average run time(10 times)
Procedural 0.001[sec]
Stream 0.023[sec]
parallelStream 0.031[sec]
dropWhile 0.024[sec]
dropWhile(Use parellelStream) 0.028[sec]

Same as above

Reasons to lose to procedural programming

Roughly speaking, in the procedural type, the processing is written out almost as it is and compiled with jdk, so even if the data is simple, the Stream that performs the intermediate processing seems to be slow.

Supplement

I want to clarify the grounds for being late. For that purpose, I tried to follow the process written in the Stream method with the function of IntelliJ. It is divided into small method calls, and the mechanism such as delayed execution is slowed down.

Is there a time when Stream is fast?

When to write it, you only have to select Stream that is easy to read, and even if it is slow, the difference is not so big. However, in terms of performance, I tried it because it seems to be effective when the elements to be handled are large.

conditions

As test data, create 1 million elements of a random character string consisting of 20 letters of uppercase letters and numbers.

BigData.java


Random r = new Random(2111);
List<String> data = range(0, 1_000_000)
    .mapToObj(i->
        r.ints().limit(20)
            .map(n -> Math.abs(n) % 36)
            .map(code -> (code < 10) ? '0' + code : 'A' + code - 10)
            .mapToObj(ch -> String.valueOf((char)ch))
            .toString())
    .collect(toList());

From this element, only the numbers are extracted, and the total is 30 or less.

Procedural sample

Procedural.java


  public static long use_for(List<String> data){
        long result = 0;
        for(String d : data){
            String numOnly = d.replaceAll("[^0-9]", "");
            if(numOnly.isEmpty()) continue;
            int total = 0;
            for(char ch : numOnly.toCharArray()){
                total += ch - '0';
            }
            if(total >= 30) continue;
            long value = Long.parseLong(numOnly);
            result += value;
        }
        return result;
    }

Stream sample

Stream.java


 public static long streamSum(List<String>data){
        return data.stream()
                .map(d -> d.replaceAll("[^0-9]", ""))
                .filter(d -> !d.isEmpty())
                .filter(d -> d.chars().map(ch -> ch - '0').sum() < 30)
                .mapToLong(d -> Long.parseLong(d)).sum();
 }

Also try takeWhile / dropWhile

I'm curious about how it will be compared to just stream and parallelStream, so I'll try it.

takeWhileSample.java


  public static long takeWhileSum(List<String> data){
        return data.stream()
                .map(d -> d.replaceAll("[^0-9]", ""))//Remove non-numbers
                .takeWhile(d -> !d.isEmpty())
                .takeWhile(d -> d.chars().map(ch -> ch - '0').sum() < 30)//The sum of the numbers is less than 30
                .mapToLong(d -> Long.parseLong(d)).sum();
  }

dropWhileSample


    public static long dropWhileSum(List<String> data){
        return data.stream()
                .map(d -> d.replaceAll("[^0-9]", ""))
                .dropWhile(d -> d.isEmpty())
                .dropWhile(d -> d.chars().map(ch -> ch - '0').sum() > 30)
                .mapToLong(d -> Long.parseLong(d)).sum();
   }

Benchmark results

Average run time(10 times)
Procedural 2.132[sec]
parallelStream 1.321[sec]
Stream 2.107[sec]
takeWhile 0.457[sec]
takeWhile(parallelStream) 1.325[sec]
dropWhile 2.175[sec]
dropWhile(parallelStream) 1.377[sec]

I feel like I finally saw the true value. takeWhile is by far the fastest. dropWhile was about the same as procedural, but calling it from parallelStream made it a lot better.

Conclusion

that's all. It was a good opportunity to try what I was interested in.

Referenced articles

Purpose of Java 8 Stream, ease of writing, readability, and the effect of concurrency Measure the execution result of the program in C ++, Java, Python.

Such.

Recommended Posts

I tried what I wanted to try with Stream softly.
I tried to interact with Java
I just wanted to logrotate with log4j 2
I tried to get started with WebAssembly
I tried to summarize the Stream API
What is Docker? I tried to summarize
I tried to implement ModanShogi with Kinx
I tried to verify AdoptOpenJDK 11 (11.0.2) with Docker image
I tried to make Basic authentication with Java
I tried to manage struts configuration with Coggle
I tried to manage login information with JMX
I wanted to gradle spring boot with multi-project
I tried to break a block with java (1)
What I tried when I wanted to get all the fields of a bean
I tried to implement file upload with Spring MVC
I tried to read and output CSV with Outsystems
I tried to implement TCP / IP + BIO with JAVA
[Java 11] I tried to execute Java without compiling with javac
I started MySQL 5.7 with docker-compose and tried to connect
I tried to get started with Spring Data JPA
I tried to draw animation with Blazor + canvas API
I tried to implement Stalin sort with Java Collector
I wanted to develop PHP with vscode remote container
roman numerals (I tried to simplify it with hash)
I tried to find out what changed in Java 9
I tried DI with Ruby
I tried node-jt400 (SQL stream)
I tried UPSERT with PostgreSQL.
I tried BIND with Docker
I tried to verify yum-cron
I tried to make an introduction to PHP + MySQL with Docker
I tried to create a java8 development environment with Chocolatey
I tried to modernize a Java EE application with OpenShift.
I tried to increase the processing speed with spiritual engineering
[Rails] I tried to create a mini app with FullCalendar
I tried to link chat with Minecraft server with Discord API
[Rails] I tried to implement batch processing with Rake task
What I was addicted to with the Redmine REST API
I tried to automate LibreOffice Calc with Ruby + PyCall.rb (Ubuntu 18.04)
I tried to create a padrino development environment with Docker
I tried to get started with Swagger using Spring Boot
I tried upgrading from CentOS 6.5 to CentOS 7 with the upgrade tool
I tried to be able to pass multiple objects with Ractor
I tried to solve the problem of "multi-stage selection" with Ruby
I tried to chew C # (indexer)
I tried to summarize what was asked at the site-java edition-
I tried using JOOQ with Gradle
I tried morphological analysis with MeCab
I tried to build the environment of PlantUML Server with Docker
I tried connecting to MySQL using JDBC Template with Spring MVC
I tried to summarize iOS 14 support
I tried to implement the image preview function with Rails / jQuery
I tried to build an http2 development environment with Eclipse + Tomcat
I tried to implement flexible OR mapping with MyBatis Dynamic SQL
I tried connecting to Oracle Autonomous Database 21c with JDBC Thin
I tried UDP communication with Java
I tried to explain the method
Try to imitate marshmallows with MiniMagick
The story I wanted to unzip
What I was addicted to when implementing google authentication with rails
I tried to reimplement Ruby Float (arg, exception: true) with builtin