Creating Asynchronous Solutions Using Threads
You previously saw how to use the Streams API to create concurrent applications that split their work across multiple streams. To process a stream in parallel, the Stream API splits work between several threads (or threads of execution). So, parallel streams run thanks to the magic of their being converted into Java threads. All of Java's concurrency features are built on top of threads.
What's a thread?
Remember those cores we spoke about earlier? Well, now we’re ready to go a bit deeper into that explanation. Java executes multiple statements on different cores at the same time. These statements belong to threads. There can be multiple threads per core, and each thread has many statements. Java statements are sequentially executed on a thread, one after another, a little like threading beads on a bracelet.
Although multiple cores can work on different threads at the same time, each core typically executes a single statement from one of its threads at a time.
How do cores decide what to execute?
Cores don't know about the threads on them. A core performs the instructions given to it by Java's built-in thread scheduler (which we'll be using in the section below). The scheduler often flips between threads, but again, a core itself usually executes one statement at a time.
The statements on different threads are executed independently of one another. In other words, they’re processed concurrently! Neat, huh? 🤓 These threads are all inside a single JVM sharing its memory and other resources - which means that threads can get tangled up pretty easily!
Why do I need to know about threads? Don't they just run automatically with that thread scheduler?
Creating and deciding when to run our threads gives you more control over what your application does. That may not sound intuitive, because as developers, you're used to writing sequential or synchronous code where things follow one another.
Concurrent code runs out of order and stops and starts in a hard to predict way. This style of execution is asynchronous. It means that things don't happen immediately. For example, if you're cooking a meal, you might put a lasagna in the oven, and while that's cooking, you might switch between stirring sauce on the stove and chopping vegetables for a salad.
Threads are the same. If you are executing a statement in one thread, another may interrupt and run its own statements. It's like starting to make some pie and noticing that the potatoes are burning. You'd have to switch over to your potato task and only come back to your pie once the more urgent potato problem is solved. If you don't control what happens when in your program, you might have to wait for many other threads before you get to continue with the next sequential statement in your first thread. By creating threads directly, you can control how your application runs.
Running Threads on an Application With Thread Scheduler
The JVM thread scheduler is responsible for running threads and making sure that every one of them gets a fair shot at execution. Your JRE makes sure that there is some fairness and that all threads make good use of your CPU's cores.
Fortunately, the thread scheduler Just Works ™ is built into Java implementations, so there's nothing more to do, but create threads it can run for you!
So, how do I create a thread?
Believe it or not, you're already an expert at creating single-threaded Java applications. No joke! Your main() method is executed by the JVM, at the start of your Java applications within a special thread called the main thread. See! It just works.
Just like the main method of your Java application, other threads also need a single method which can be called to start their processing. The easiest way to do this is to use a lambda. Fire up JShell and try out the following, which first creates a thread using a lambda and then executes it concurrently.
jshell> Thread t = new Thread(()->System.out.println("Hello Openclassrooms Student!"));
t ==> Thread[Thread-1,5,main]
Line 1: Create a lambda which prints "Hello Openclassrooms Student," and then pass it to the constructor of the thread class.
Line 2: You can see the thread is returned and stored in the variable t.
Now run that thread asynchronously:
jshell> t.start()
Hello Openclassrooms Student!
Line 1: To allow the thread scheduler to run your thread, call start() on it.
Line 2: Shortly afterwards, you'll see the greeting from the lambda.
It won't be obvious when you run it, but calling t.start()
doesn't actually run your lambda immediately. All it does it is mark your lambda as ready to be run by the thread scheduler. The scheduler runs it asynchronously. That is, it runs it when it is ready to do so.
This example was relatively simple. However, threads can run longer chains of execution consisting of classes calling other classes and many, many methods.
Java doesn't dictate how different JVM thread schedulers should behave. Typically, these take your thread through a lifecycle from creation through termination. As the thread does different things, the scheduler changes its ThreadState, which you can query by calling t.getState().
Before trying this out on our own thread, let's find out what these states are! During its life, a thread can exist in any of the following states, which the thread scheduler is responsible for putting it in.
State | Description |
The thread has not had its start() method called. | |
The thread is currently being run by the JVM. | |
The thread has been put to sleep by your code. That is, by calling Thread.sleep(milliseconds). | |
The thread has been intentionally suspended by code trying to have more control over a predictable execution. | |
The thread has been released from a WAITING state and is ready to go back into a RUNNABLE state when the scheduler feels it is safe to do so. | |
The thread has completed executing. Essentially, it's finished! |
You can look at a thread's current state by calling t.getState().
Checking the State of Your Thread With Thread Scheduler
Let's fire up JShell again and repeat the previous example, looking at the thread's state before and after. First, create a thread like before:
jshell> Thread t = new Thread(()->System.out.println("Hello Openclassrooms Student!"));
t ==> Thread[Thread-4,5,main]
Next, call getState()
on this new thread:
jshell> t.getState()
$11 ==> NEW
As you can see at Line 2, the thread state is set to NEW, which means it hasn't yet been given to the thread scheduler. Do that by calling start()
on it:
jshell> t.start()
Hello Openclassrooms Student!
Just as you saw before, the scheduler has now run the lambda. But what has happened to the thread state? Find out by calling getState() again!
shell> t.getState()
$13 ==> TERMINATED
Can you see how the thread moves from a NEW to a TERMINATED state between creation and completion? That's the thread scheduler at work behind the scenes.
It's time to learn how to create a thread that is more complicated than a lambda.
Using Runnables to Create Threads
When creating Java threads, there are two interfaces you can choose between: Runnable and Callable.
Didn't we already create a thread earlier? What's this Runnable and Callable stuff?
When we passed a lambda to the thread's constructor previously, the JVM assumed that it was an implementation for the Runnable interface. Runnable is a Java type that has one void method called run()
. You use java.lang.Runnable when you need some work done, but don't need a returned result.
Why would we needs something that doesn't return a result?
Breaking your complex task into sub-tasks can sometimes mean that you need code to go off and complete a job by itself, such as sending an email or generating a file. Such code doesn't need to return anything.
There are three common ways of creating threads with a Runnable, which you'll see in concurrent Java applications:
Using a lambda - as you've already seen.
Implementing the Runnable interface, which means defining a run() method.
Subclassing thread since it is a Runnable.
Let's look at all three and where to use them.
Why was our lambda treated as a Runnable?
Lambdas automatically do the right thing when used in place of any class or type with a single public method. The inline lambda was inferred as being the same as creating a real class similar to the following:
class RandomRunnable implements Runnable {
@Override
public void run() {
System.out.println("Hello Openclassrooms Student!");
}
}
Line 1: Create a new class which implements the Runnable interface.
Line 2 to 5: Override the run() method and put the behavior in there.
Creating a Thread From a Runnable
Let's create a thread from our Runnable by following these steps:
Step 1: Startup JShell.
Step 2: Copy and paste in the class declaration created for RandomRunnable.
Step 3: Create a thread by creating an instance or the Runnable and passing it to thread's constructor. Enter the following into JShell:
Thread t = new Thread(new RandomRunnable());
Step 4: Call
start()
on the new thread results in your saying "Hey, thread scheduler! Please run this when ready!"
t.start();
And the thread scheduler is ready to greet you again. 😄
In addition to using a lambda or implementing Runnable, you can also create a new Runnable by extending the thread class. It removes the need for creating both a Runnable and a separate thread instance, as we did above. Thread already implements Runnable, so all you need to do is:
Extend thread.
Override the
run()
method with your concurrently running code.
Let's see how to do away with having to deal with separate Runnable and thread objects by extending the thread:
Step 1: Start up JShell.
Step 2: Create a RandomThread by extending the thread:
class RandomThread extends Thread {
@Override
public void run() {
System.out.println("Hello Openclassrooms Student!");
}
}
Step 3: Create a new instance of the thread:
RandomThread t = new RandomThread()
Step 4: Start the thread!
t.start()
Again, you'll see "Hello Openclassrooms student!"
Which is better: implementing the Runnable interface or extending my thread?
There are a few answers to this question. I recommend implementing the Runnable interface. While extending thread appears simpler, by implementing Runnable, you remove the tight coupling which comes from inheritance, because all you need to do is implement a class with a run() method! It allows your design to avoid inheriting anything unnecessary - such as the rest of the thread's implementation, which doesn't concern you. Implementing Runnable also allows you to test it in confidence and keep your design simple.
As you'll see later in the course, there are many other ways of creating your threads, which can give you even more control. 😎
When should I use inline lambdas?
If all your run() method does is print a statement, add a number, or something similarly predictable and low-risk, it makes sense to use a lambda inline. Otherwise, by defining a lambda inline, you make it very hard to test that behavior on its own. That's not necessarily a bad thing if the lambdas behavior is proven with broader tests like other unit or integration tests. However, watch out for code bloat in your Lambdas. Defining more than three lines per lambda block is usually a bit of a sign, or smell, that you need to reorganize your code.
Using Callables and FutureTask
The examples we've seen so far use Runnable, which requires one void run() method. Did you see how it's a void method? That means that it doesn't have a return type, so that it won't return anything!
But what happens if we want to get a result back? For instance, in calculating an average of several temperature values, we might want to use that value? Runnable swallows it whole! 😧 Luckily, Java's concurrency framework has created the generic Callable Interface for this purpose. It has one method,call(), which returns a value, unlike Runnables. In other words, we use java.util.concurrent.Callable when we need to get some work done asynchronously and fetch the result of that work when we need it.
How can I create a thread out of a Callable?
Unfortunately, a thread isn't designed to return a result. You need something Runnable, which can hold onto a result and give it back in the future when it's ready. The simplest thing to do here is to use a FutureTask. As the class name suggests, it runs the Callable task in the future. To do this, you:
Create a Callable by implementing it or using a lambda.
Create a new instance of a FutureTask by passing your Callable to its constructor.
Create a thread from FutureTask, the same as with a Runnable.
Submit our thread to the ThreadScheduler by calling start().
Call get() on your FutureTask, which will block, or wait, until a result is calculated.
I'll show you how to do this, once again, in JShell. See if you can follow along.
Step 1: Start JShell.
Step 2: Create a callable using a lambda. This one finds the average of two doubles:
jshell> Callable<Double> callableTemperatureAverage = () -> (220.3 + 290.3)/2.0
Assign a lambda to a new Callable of type double named callableTemperatureAverage
. The lambda stands in for Callable's call()
method, the same way it did for Runnable's run()
method previously.
Step 3: Next, let's create a FutureTask from the Callable.
jshell> FutureTask<Double> futureTask = new FutureTask<>(callableTemperatureAverage)
Passing a Callable object to FutureTask's constructor now gives a FutureTask of type double. This is itself a Runnable which simply wraps around the Callable.
Step 4: It's time to create a thread from our FutureTask:
jshell> Thread t = new Thread(futureTask)
Pass the Runnable FutureTask to thread's constructor to create a new thread.
Step 5: Let's check our thread state by calling getState()
:
jshell> t.getState()
$30 ==> NEW
With Line 1, call getState()
on the new thread, t
. With Line 2, you can see that the thread is in a NEW state before running it.
Step 6: Submit the thread to the thread scheduler and run it.
jshell> t.start()
Submit the thread for scheduling so it can be asynchronously run.
Step 7: Check if it's run by calling getState()
.
jshell> t.getState()
$32 ==> TERMINATED
With Line 1, call getState()
to see the current state of the thread. With Line 2, see that the t is in a TERMINATED state. This tells you that the thread has finished running.
Step 8: Let's get the result back. This time, you won't see any output, as we've not printed anything. To get back the result of the Callable, call the get()
method on the FutureTask:
jshell> futureTask.get()
$33 ==> 255.3
With Line 1, the get()
call returns the result straight away. If the results weren't ready yet, the code would have waited at this line and not run any further, until you got a value back. With Line 2, you can see that the result of 255.3 is returned. You could then use this in some other computation if you had to. 🤓
Modifying Our Planet Analyzer to Use FutureTasks: Practice!
Now, let's apply this new knowledge to our project! Follow along as we modify our PlanetAnalyzre to create four Callables, which are used to calculate an average split across two equally sized files of temperature readings. The four Callables will:
Calculate the sum of temperature values per file.
Calculate the sample size of temperature values per file; removing comments and noise.
Let's look at the methods which these two Callables will use:
As you can see, each method does just one thing; counting or summing rows. We'll recombine the two sums and two sample sizes, to calculate the average temperature. Let's do this together:
Let's break it down:
Step 1: We created Callables, which worked out those sums and sample sizes.
// 1. Create Callables for summing
Callable<Double> sumOfFileOne = ()-> fileAnalyzer.sumFile(fileOnePath);
Callable<Double> sumOfFileTwo = ()-> fileAnalyzer.sumFile(fileTwoPath);
// 2. Create Callables for counting
Callable<Double> countOfFileOne = ()-> fileAnalyzer.countDoubleRows(fileOnePath);
Callable<Double> countOfFileTwo = ()-> fileAnalyzer.countDoubleRows(fileTwoPath);
These use lambdas to call into our fileAnalyzer.
Step 2: We converted these into FutureTasks, which we can wait on to asynchronously give us back a result:
// 3. Create FutureTasks for summing
FutureTask<Double> sumFileOneFuture = new FutureTask<>(sumOfFileOne);
FutureTask<Double> sumFileTwoFuture = new FutureTask<>(sumOfFileTwo);
// 4. Create FutureTasks for counting
FutureTask<Double> countFileOneFuture = new FutureTask<>(countOfFileOne);
FutureTask<Double> countFileTwoFuture = new FutureTask<>(countOfFileTwo);
FutureTask is a peek into how most Java developers work with threads today. You'll see more Futures at work in the next chapter.
Step 3: We then use FutureTask to create four threads; one for each future.
// 5. Create Threads
Thread t1Sum = new Thread(sumFileOneFuture);
Thread t2Sum = new Thread(sumFileTwoFuture);
Thread t1Count = new Thread(countFileOneFuture);
Thread t2Count = new Thread(countFileTwoFuture);
Step 4: We call start()
on each thread, to submit it to the scheduler.
// 6. Run threads
t1Sum.start();
t2Sum.start();
t1Count.start();
t2Count.start();
Step 5: Finally, call get()
on each FutureTask to block until we get back a double to use in our calculation.
// 7. Wait for FutureTask
Double valueOfFileOneSum = sumFileOneFuture.get();
Double valueOfFileTwoSum = sumFileTwoFuture.get();
Double valueOfFileOneCount = countFileOneFuture.get();
Double valueOfFileTwoCount = countFileTwoFuture.get();
Running the Benchmark!
Checkout the following branch:
git checkout p1-c3-raw-threads-and-future-tasksAs before, you can run the benchmarks with the runBenchmarks task. You'll find the new implementation listed in the output as BenchmarkRunner.benchmarkRawThreadsWithFutureTasks:
./gradlew runBenchmarks
For me, these were the second-fastest nearly as fast as ParallelStreams. How did they perform for you?
Benchmark Mode Cnt Score Error Units
BenchmarkRunner.benchmarkMultiProcess thrpt 20 1.514 ± 0.174 ops/s
BenchmarkRunner.benchmarkParallelStream thrpt 20 47.493 ± 5.298 ops/s
BenchmarkRunner.benchmarkRawThreadsWithFutureTasks thrpt 20 30.258 ± 1.528 ops/s
BenchmarkRunner.benchmarkSingleProcess thrpt 20 13.015 ± 1.499 ops/s
As you've seen, using bare threads puts the responsibility of handling how threads behave in your hands. The more complex your application, the more useful this power can be as you get to decide which threads you need to create and which threads you want to wait on for a result.
Let's Recap!
Threads are objects that get executed concurrently inside a single JVM sharing its memory and other resources. They are executed asynchronously, outside of your control, and depend on the JRE's Thread Scheduler to even run.
Threads can have one of several ThreadStates depending on what the thread scheduler has decided to do with it:
NEW- the thread has been created but not submitted to the scheduler.
RUNNABLE- the thread is currently being run by the JVM.
TIME WAIT- the thread is sleeping for a fixed interval.
WAITING- the thread has been suspending while it waits for a condition to reactivate it.
BLOCKED - the thread is in the process of going from WAITING to RUNNABLE.
TERMINATED- the thread has finished its work and been released by the scheduler.
Creating your own threads gives you low-level control over how and when your threads run. A thread can be created by either extending the thread class OR providing a Runnable to its constructor.
You can create a Runnable by implementing the Runnable interface. It has one void method
run()
, which can't return any results.If you want a result back from your lambda, you can create an instance of a Callable and wrap it inside a Runnable such as a FutureTask.
A FutureTask is a Java concurrency class that implements both Runnable and the Future interface. We use it to run a Callable inside a thread. This has one
.get()
method, which you can call to get the result back from the Callable run by the FutureTask.
In the next chapter, we'll look at the power of thread pools and futures to manage the work in our application!