Book review: Mastering lambdas

cover.jpg Mastering lambdas is the second book I’ve read about lambda. The first one, that i’ll use for comparison, was « Java 8 in action » (which is about Java 8 in general, not only lambdas).

This one is written in the same precise style as the very good (and relatively little known) « Java generics and collections », of which Naftalin is a co-author. Some of the content (exception management, performance considerations,…) can’t be found anywhere else. On the other hand J8IA is not as well written, but it covers one important technique that « Mastering lambdas » doesn’t (using CompletableFutures with Streams).
Overall it’s a great book. Also because it’s so concise, it’s not too thick and can be read in a few days.
This review is based on the printed version, not the Kindle edition (that i didn’t try).

CHAPTER 1: Taking Java to the Next Level

The book begins with the now usual explanation of how in a multi-core context, going from external to internal iteration lets the java runtime utilize new « degrees of freedom », particularly to enable parallelism.

A structured explanation of the main programming model changes

From anonymous inner classes to lambdas: All that can logically be inferred by the compiler is introduced step by step
From collections to streams: The old model of chaining transformations by creating a new Collection at every step is bad from performance (creating a lot of intermediary objects), and also pollutes the code with boilerplate. The solution is to use unix-like pipes and filters, which are lazy and compose into pipelines better than classes.
From sequential to parallel: Parallelism is explicit, but unobstrusive, using just parallelStream().
Lazyness: The intermediate operations don’t pull any data from the pipeline’s source. Work is only done when it can’t be delayed anymore, which is at the pipeline’s terminal operation.

CHAPTER 2:The Basics of Java Lambda Expressions

The grammar of lambdas

This chapter goes in more detail into the grammar of lambdas. The compiler’s inference engine is well explained, which makes lambdas feel less magical than in the usual presentations. The part on the different kinds of method handles is much better than what I read so far (i didn’t get the difference between instance bound and instance unbound references before).

From basic stuff to more advanced topics

It goes progressively:
Expression lambdas and statement lambdas
Differences with anonymous classes (ex: doesn’t have to be a new instance every time)
Syntax, scope, and capture rules (unlike Javascript the value, and the not the variable, is captured)
java.util.function: the starter kit of functional interfaces
Type inference rules
Method references kinds: static, instance bound, instance unbound (for those, invocation target is the first lambda argument)
Detailed rules: checked exceptions handling, overload resolution

CHAPTER 3:Introduction to Streams and Pipelines

Goals of pipelining operations

A pipeline fuses multiple logical operations into one single pass on the data, like with unix pipes. Pipelines are composed of:

  1. a source
  2. intermediate operations (composed from the API’s Stream->Stream methods)
  3. a terminal operation

The most common of each are listed in this chapter.
An important point is that composing stream operations into a pipeline is only a logical transformation, which doesn’t pull any data from the pipeline’s source yet.
Another advantage of streams related to lazy processing is short-circuiting: for some operations, if a result is found, the rest can be skipped (ex: if we want to check that a property is true for all elements, we can skip the rest of the stream if we find one element for which it’s false).

One unclear point in this chapter

« As we saw in §3.1, calling the terminal operation of a pipeline results in the execution of a fusion of its intermediate operations. As a result, the usual debugging technique of stepping through operations is not available for streams » For sure, I can place a breakpoint in a lambda (tested with Eclipse and IntelliJ):

Stream.of ("toto", "tata", "titi").map (s -> {
	    System.out.println ("STEP 1 Stream: " + s); //i can stop at this breakpoint
	    return s;
	}).map (s -> {
	    System.out.println ("STEP 2 Stream: " + s); //i can stop at this breakpoint
	    return s;
	}).forEach (s -> {
	    System.out.println ("STEP 3 Stream: " + s); //i can stop at this breakpoint
	})
	;

That’s one advantage over .NET lambdas, where I can’t stop at the same breakpoints (tested with C# 4.5 in Visual Studio). I think the author means that there is no breakpoint corresponding to completion of a single filter of the pipeline, because execution order is not the same as with external iteration. The previous code displays this:

STEP 1 Stream: toto
STEP 2 Stream: toto
STEP 3 Stream: toto
STEP 1 Stream: tata
STEP 2 Stream: tata
STEP 3 Stream: tata
STEP 1 Stream: titi
STEP 2 Stream: titi
STEP 3 Stream: titi

Whereas the corresponding external iteration approach:

List<String> input = Arrays.asList ("toto", "tata", "titi");
List<String> step1 = new ArrayList<String> (input);
for(String s : step1) System.out.println ("STEP 1 Iterator: " + s);
List<String> step2 = new ArrayList<String> (step1);
for(String s : step1) System.out.println ("STEP 2 Iterator: " + s);
List<String> step3 = new ArrayList<String> (step2);
for(String s : step1) System.out.println ("STEP 3 Iterator: " + s);

Displays that:

STEP 1 Iterator: toto
STEP 1 Iterator: tata
STEP 1 Iterator: titi
STEP 2 Iterator: toto
STEP 2 Iterator: tata
STEP 2 Iterator: titi
STEP 3 Iterator: toto
STEP 3 Iterator: tata
STEP 3 Iterator: titi

So pipeline execution one element at a time prevents us from mentally slicing execution into individual steps during debug, but it is still possible to follow the steps of processing a single element through the pipeline. This could have been formulated more clearly, unless i’m missing the point (there is a limitation though, you can’t see the lambda’s outside variables ).

Pipelines and non-interference

Finally, the notion of non-interference is introduced: for a pipeline to be parallel-ready, it must have no side-effects, and especially not modify its source.
The next 2 chapters explore the pipeline’s « extremities »: its end in chapter 4 about collection and reduction, and its beginning in chapter 5 about Stream sources.

CHAPTER 4: Ending Streams: Collection and Reduction

Collectors: the pipeline terminal operation

A Collector is a pipeline terminal operation. When stream.collect(collector) is called, it accumulates elements into a mutable collection. It may also do a final post-processing operation on that collection, for instance producing an aggregate result.

The 3 kinds of Collectors

There are 3 kinds of collectors, starting with the basic out-of-the-box ones and continuing with the API’s more and more general extension points:

The predefined collectors

They create a stream from a collection source: Collectors.toSet/groupingBy/partitionningBy, …

Composed collectors

The predefined partitionningBy() returns a Map<Boolean,List<T>> But the API can’t possibly do that for every kind of end result! So instead it offers a more general collect() overload, to compose a base/upstream collector (groupingBy) with a user-specified downstream collector (Collectors.toList/counting/maxBy/…). The API has similar overloads for other collector factory methods (for instance Collectors.mapping)

Custom collectors

They are the most general construction. They are explained just below:

Creating custom Collectors

Custom collectors are instantiated by passing all the collector’s parts to a factory method like so: Collector.of(supplier, accumulator, combiner, finisher, characteristics):
the supplier creates the mutable collection
the accumulator accumulates value into that
the combiner is used in parallel mode to agregate partial aggregate results
the finisher applies a final transformation to the mutable collection
the characteristics are used by the framework (or custom implementations) to decide on possible optimizations.
This is more difficult than the chapters before and I had to read it twice, but I think I now have a better mental model of this large API. The part on collectors is concluded by explaining the contract demanded from custom Collector implementations (for instance thread-safety requirements).

Collector’s little brother: Reduction

The second (and shorter) half of this chapter is about the other major kind of terminal operation, reduction.
Reduction summarizes a stream by an immutable value. Unlike collection, at each step a new temporary immutable value is produced. Reduction is a special case of collection, as it can be implemented by a collector that itself uses a downstream collector (see Composed collectors).
Since the pipeline may be empty by the time reduction is reached, the API provides both an overload of redu
ce()
that takes a default value (the reduction’s neutral element, like 0 for +), and another that doesn’t take a default value but returns an Optional<R> result.

In closing

Collection will probably be used more often than reduction. Both are essential components of the Stream API to understand (but if you understand the more general collection you understand reduction as well).
This is important especially for parallel streams because if done wrong, the terminal operation can be a bottleneck that prevents any parallel speedup.

CHAPTER 5: Starting Streams: Sources and Spliterators

After the pipeline’s end, this chapter examines the pipeline’s beginning (this order is justified by the fact that custom stream initiating is more complex).

JDK out-of-the-box stream starters

The easiest way is to use the new API methods that directly provide Streams from sources:
The JDK’s collections have been augmented with the Collection.stream() method, which returns a stream whose source is that collection
The Stream interface has static factory methods: Stream.of(...), Stream.empty()/Stream.iterate/generate(lambda)
The IO API has been augmented too: BufferedReader.lines() (lazy, unlike eager readAllLines().
One restriction is that file reading is sequential in most (but not all, as we’ll see) cases, in which case it can’t be parallelized. The java 7 NIO2 API has been augmented too with Files.walk/find/list that take lambdas.

Custom streams

To implement custom streams, one must implement the new Spliterator core abstraction. Spliterators are explained much more clearly than in other sources I read so far. Spliterators are strongly linked to the java 7 fork-join framework for recursive decomposition (divide-and-conquer), so this begins with a reminder: considering a recursive task, at each recursive iteration the computation can either do the remaining work sequentially, or fork (then join) parallel subtasks.

Spliterators

The name Spliterator evokes this choice: « split » evokes the parallel divide-and-conquer part, while « iterator » evokes the « bottom » sequential iteration.
The 2 essential methods to implement are:
tryAdvance(Consumer) implements the « bottom » sequential iteration
trySplit() returns null if further splitting is not worth it, otherwise it returns another Spliterator AND « shortens » the current one so that the two don’t overlap
Once your custom Spliterator is implemented, use StreamSupport.stream(Spliterator) to instantiate a custom Stream (an example is given at the end of the chapter).

A touchy subject: exception propagation

Another difficult topic that is often avoided by authors is exception propagation during stream processing; here an example using IOException/UncheckedIOException (the latter introduced by java 8) is included, illustrating the two alternatives.
The easiest alternative is to « do nothing »: with lazy evaluation, the exception will be generated during the terminal operation, when it triggers execution of the exception-generating lambda, and that exception will completely stop the terminal operation.
This is OK in most cases as most errors are not recoverable.
On the other hand some errors ARE recoverable (ex: if one of the files is not readable, this must not stop processing of the other files). In this case recoverable errors must be notified by checked exceptions, thrown from eagerly evaluated operations (in other words: can’t use lazyness in this case).

Example: IO parallelization

This chapter ends with an original example of a parallel stream that does non-blocking IO. It is constructed from a custom Spliterator that splits a NIO MappedByteBuffer (a native « direct » buffer that maps a file into memory, here in read-only mode).
I liked the fact that it’s a rare example of using NIO as an example in a book, and in believable circumstances, and also it really helps to understand how spliterators work; the main point is that to achieve parallel gains in the stream starting stage of stream processing, the stream source must splittable without contention.
This is the case for the example’s MappedByteBuffer (because it’s loaded once and for all in read-only mode).
This is a necessary but not sufficient condition though – the problem to solve must be intrinsically non-sequential, which is also the case for the example’s « line grep » functionnality. As the author says: « Cooperation with Java NIO enables the Stream API to process input from large files with a high parallel speedup. »

In closing

To quote Naftalin again, « The main strength of the Stream API is that the solutions are straightforward, and respecifying the problem to simulate the different options required only small corresponding changes in the code. »

CHAPTER 6: Stream performance

Measure, don’t guess

The first step to improve performance is measuring it, which is often done with a micro-benchmark; after listing some of the most common microbenchmark pitfalls (lack of warmup, dead code elimination, ..), two frameworks that alleviate these problems are introduced (Java Microbenchmark Harness and Google Caliper).

Factors making parallelism worth it or not

After the bench harness is ready and performance requirements have been gathered, if the code has been made parallel-ready using Streams correctly, it is time to choose between sequential processing or actual parallel processing; the latter may yield a speedup or not, depending on different factors:

The execution context
  • Can this parallel stream use all the machine’s cores, or even all the web server’s cores?

This is not necessarily the case, and if concurrent requests or other running programs compete for cores the speedup will be much smaller. A possible pointer is setting the common fork-join pool size at VM startup time.

Task sizes
  • The stream’s workload is defined as W = Q(individual task time) * N(number of elements).

The bigger W, the likelier parallel speedup exceeds parallelization overhead

  • Stream extremities (Spliterator/Collector) performance: at high Q parallelizing the stream’s intermediate operations is worth it, while at low Q splitting by sources and concurrent accumulation by collectors are likely to become bottlenecks
  • « More complex intermediate operations will usually involve some I/O or synchronization, resulting in fewer gains from going parallel »
The stream characteristics

Apart from the runtime context and workload, the stream intrinsic characteristics also have a big impact:
SORTED
SIZED
DISTINCT
ORDERED (most important)
These characteristics are a product of both the stream’s source and modifications by intermediate operations. Some optimizations will be possible or not based on them.

The kinds of operations performed

Then some other factors are considered:

  • Stateful VS stateless operations
  • Boxing/Unboxing overhead
  • Spliterator and Collector performance in different cases

In closing

It concludes by saying that Stream creation and collection is not where big parallel speedups are achieved, but can be bottlenecks that need tuning. To achieve parallel speedup, key factors are:

  • doing most of the work in intermediate operations
  • sufficiently high data set size (at least 10_000/100_000 elements) and per-element processing cost (at least 40/200µS per element)

Optimiza
tion or not, « investment in writing parallel-ready code will not be wasted, whatever the outcome of your performance analysis. Your code will be better and, when advances in technology change the cost-benefit balance of going parallel, it will—as the name suggests—be ready »

A few omissions

So a very good chapter, but with some omissions:
The biggest one is that every call to parallelStream() uses the same singleton ForkJoinPool: ForkJoinPool.commonPool().
This is a big trap imo (that Heinz Kabutz likes talking about), so don’t ever do blocking IO or any blocking calls in a parallel stream. The author does insist that parallel streams are for CPU intensive tasks though.
Exception: a more obscure point is that if you call parallelStream() while already within a ForkJoinPool, that pool is used instead of ForkJoinPool.commonPool().
Well it seems to be the case but it’s unspecified as far as I know..
CompletableFuture is not mentionned.
While it is not related to streams, J8IA does talk about it and makes the very important point that: « Conversely, when dealing with concurrency instead of parallelism, or when your main goal is to perform several loosely related tasks on the same CPUs, keeping their cores as busy as possible to maximize the throughput of your application, « . CompletableFuture is useful for max throughput, while parallelStream() is useful for max performance. Which is more important most of the time? Probably throughput and J8IA shows a possible way to go that direction using Streams and CompletableFutures together.

CHAPTER 7: API Evolution with Default Methods

Reasons for their introduction

Default methods are linked to lambdas, because they were added to interfaces so that the addition of methods such as Iterable.forEach() is not a breaking change.

A few implementation patterns

After presenting their syntax, this chapter lists their common points/differences with abstract classes, and presents some interesting implementation patterns.

More technical points

More technical points include:
While lambdas are restricted to implementing Single Abstract Method Interface/FunctionalInterface, within a FunctionalInterface, concrete default methods are legal, and useful for instance as auxiliary methods to compose lambdas (ex: Predicate.and()).
Inheritance and overload resolution rules
The reason why they can’t be declared final (which can be annoying) is just that the inheritance rule « classes always win over interfaces » prevents it.
Static interface methods

CHAPTER 8: Conclusion

Java 8 brings concepts of functional programing, immutability, and lazy evaluation to the java language. Their integration in an old language like java is well done, and they enable more concise, readable, and performant code.

Overall: Recommended

A very polished book. My only regret is that it doesn’t talk about mixing Streams and CompletableFuture for better throughput.

Laisser un commentaire

Ce site utilise Akismet pour réduire les indésirables. En savoir plus sur comment les données de vos commentaires sont utilisées.

%d blogueurs aiment cette page :