Item 46: Prefer side-effect-free functions in streams

46. Select stream processing without side effects

Changes in paradigm due to streams

Streams are not just APIs, they are changes in a paradigm rooted in functional programming, and we must adapt to that paradigm. The most important part of the stream paradigm is to structure the operation as the result of a series of transformations by a pure function. A pure function is one whose result depends only on its input, not on mutable states, and does not change other states. In order to achieve this paradigm, intermediate and termination operations in the stream must be free of side effects.

Below, we will look at a program that calculates the frequency of words contained in a file.

// Uses the streams API but not the paradigm--Don't do this!
Map<String, Long> freq = new HashMap<>();
try (Stream<String> words = new Scanner(file).tokens()) {
    words.forEach(word -> {
        freq.merge(word.toLowerCase(), 1L, Long::sum);
    });
}

It uses streams, lambda expressions, and method references, and the results are correct, but it doesn't take advantage of the stream API. The problem is that we are changing the external state (freq variable) in forEach. In general, the code that does not display the result in the stream forEach is the code that changes the state, so it may be bad code.

The following is what it should be.

// Proper use of streams to initialize a frequency table
Map<String, Long> freq;
try (Stream<String> words = new Scanner(file).tokens()) {
    freq = words
        .collect(groupingBy(String::toLowerCase, counting()));
}

** forEach should be used to show the result of a stream's operation, not the operation itself. ** **

Use of collector

The improved code above uses a collector, which is essential for using streams. The Collectors API has 39 methods, and some methods take up to 5 arguments, which looks scary. However, you can use this API without going deep. At first, we will ignore the Collector interface and perform reduction (combining the elements of the stream into one object).

There are `toList ()`, ``` toSet () , toCollection ()` `` as methods to make the elements of the stream a collection. These return list, set, and any collection, respectively. The top 10 of the frequency table is extracted using these.

// Pipeline to get a top-ten list of words from a frequency table
List<String> topTen = freq.keySet().stream()
    .sorted(comparing(freq::get).reversed())
    .limit(10)
    .collect(toList());

Assumed in the above code, Collectors members should be statically imported for stream pipeline readability.

toMap method

Except for the above three, the remaining 36 methods are mostly for mapping streams. The simplest is `toMap (keyMapper, valueMapper)`, which takes a stream-key function and a stream-value function as arguments. An example is as follows.

// Using a toMap collector to make a map from string to enum
private static final Map<String, Operation> stringToEnum =
    Stream.of(values()).collect(
        toMap(Object::toString, e -> e));

The above code throws a IllegalStateException if there are multiple identical keys. One way to prevent such collisions is to have a merge function (`BinaryOperator <V>` where V is the map value type) in the argument. The following example creates a map of the best-selling Album for each artist from a stream of Album objects.

// Collector to generate a map from key to chosen element for key
Map<Artist, Album> topHits = albums.collect(
   toMap(Album::artist, a->a, maxBy(comparing(Album::sales))));

Another use for the toMap method, which takes three arguments, is to make the last written positive when a key conflict occurs. The code example at this time is as follows.

// Collector to impose last-write-wins policy
toMap(keyMapper, valueMapper, (oldVal, newVal) -> newVal)

There is also a toMap method that takes four arguments, and the fourth argument specifies the Map that implements the return value.

groupingBy method

In addition to the toMap method, the Collectors API also has a groupingBymethod.groupingby```The method creates a map that categorizes the elements based on the classifier function. The classifier function is a function that receives an element and returns the category (Map key) of that element. This is the one used in the anagram program illustrated in Item 45.

words.collect(groupingBy(word -> alphabetize(word)))

In order to return a collector that generates a Map whose value is other than List in the groupingBy method, it is necessary to specify the downstream collector in addition to the classifier function. In the simplest example, if you pass toSet to this parameter, the value of Map will be Set instead of List. Another simple example of taking two arguments to the groupingBy method is passing counting () to the downstream collector. counting () can aggregate the number of elements in each category. An example of this is the frequency table presented at the beginning of this chapter.

Map<String, Long> freq = words
        .collect(groupingBy(String::toLowerCase, counting()));

In the groupingBy method that takes three arguments, you can specify the type of Map to be generated. (However, the Map factory comes in the second argument, and the downstream collector comes in the third.)

Other methods

The counting method is specialized for use as a downstream collector, and similar functionality can be obtained directly from Stream, so calls like `collect (counting ())` should not be made. There are 15 more methods with such characteristics in Collectors, 9 of which are method names starting with summing, averaging, summarizing. Other methods similar to Stream's methods are reducing, filtering, mapping, flatMapping, and collectingAndThen.

There are three Collectors methods that I haven't mentioned yet, but they have little to do with collectors. The first two are the `minBy``` and `maxBy``` methods. They take a Comparator as an argument and return the smallest and largest elements from the stream elements.

The last Collectors method is `joining```, which only manipulates streams of `CharSequence``` instances (such as Strings). Joining with no arguments returns a collector that only joins the elements. Joining with one argument takes a delimiter as an argument and returns a collector that sandwiches the delimiter between the elements. Joining with 3 arguments takes prefix and suffix as arguments in addition to delimiter. If the delimiter is a comma and the prefix is [and the suffix is],

[came, saw, conquered].

become that way.

Recommended Posts

Item 46: Prefer side-effect-free functions in streams
Item 80: Prefer executors, tasks, and streams to threads
Azure functions in java
Item 45: Use streams judiciously
[Ruby] Exception handling in functions
Create Azure Functions in Java
Item 65: Prefer interfaces to reflection