Before we dive in, let’s do a quick recap of the methodology for optimizing code:
Consider the approaches you can use to optimize your code.
Deconstruct your program's task so that it can work on multiple cores.
Identify if there is a benefit in optimizing your code.
Implement it!
How do I measure the benefit of making my code more efficient?
It's hard to predict if your design is a good one. Don't spend too much time implementing a complex algorithm if you're not sure that your original application has performance issues.
There are always educated guesses at play. It's OK to get them wrong. You're guessing about your execution time and its relation to some dataset. Ask yourself if the risk of getting it wrong and that additional complexity of optimized code is something your team wants to own. Talk to your teammates and figure this one out.
You can read about Gustafson's and Amdahl's law or try it out and measure it. See the Java Debugging course to learn how you can use VisualVM to measure where your code is slow, before trying to make it go faster.
Let's dive into optimizing your code!
Parallelizing by Running Several Programs at Once
One of the easiest ways of ensuring that your Java program makes use of several cores is to run numerous versions as independent processes. That is, starting up several JVMs, each running your program. If you run as many instances as you have cores, you're likely to utilize your hardware better.
Wait! Won't I just end up with the same problem solved by each JVM, producing identical results?
Good point! That wouldn't be very useful, would it? One of the secrets to running your code in parallel is learning to deconstruct the data you're working with and the task you're doing to potentially solve different parts of your solution at the same time.
Let's do that together, starting with breaking down a problem:
Imagine that the developers at a space station wanted you to optimize some code which analyses a file, sorted by temperatures, of all the planets observed by the Kepler project, and produces two outputs:
Average temperature for those which are rated as 288 degrees Kelvin or warmer.
Average temperature for all the planets cooler than 288 degrees Kelvin
OK, now let's go through the steps!
Step 1: Consider the approaches you can use to optimize your code.
There are three options; let's check out each one.
The Serial Approach
Leave it as is. There already is a program that keeps running totals for each of these and calculates the average. It would take as long as it requires to inspect every single row in your input file and have the added overhead, or extra processing time, of computing both of those averages, one after the other.
The Parallel Approach
You could write a single program that takes a file and calculates an average based on the temperatures in that file. If your data was already sorted by temperature, you could split the values into two files and run two instances of your program at the same time. In this case, we've deconstructed the program data into two sets to run each in parallel.
The Concurrent Approach
You could write a program which deconstructs the problem into tasks which read a file, keep a running average of each temperature, and report the results.
Step 2: Deconstruct your program's task so that it can work on multiple cores.
As this could all happen at the same time in the same JVM process, you need to be careful of doing calls at the same time as others. Multiple calls could negatively affect your results. For instance, imagine there were two calls to this method shortly after one another:
Double averageKelvinTemperature = 0.0;
Double sampleSize = 0.0;
public void updateRunningAverage(Double temperatureOfPlanet) {
// Calculate a new total
double newTotal = averageKelvinTemperature + temperatureOfPlanet;
// Increment sample size
sampleSize++;
// Calculate new running average
averageKelvinTemperature = newTotal / sampleSize;
}
In this example, the field averageKelvinTemperature gets updated at Line 10 to a new running average, every time the method is called with a new temperatureOfPlanet.
What if you kicked off two executions of that method at the same time, as part of a concurrent task? Both would use the same value of averageKelvinTemperature to calculate newTotal at Line 6.
One of the two calls would set a new averageKelvinTemperature at Line 10, but the other task would almost immediately overwrite this! The last one to update it would set a value, and you'd lose the sample from the other one! Two callers should not execute Lines 6 to 10 at the same time.
Step 3: Identify if there is a benefit in optimizing your code.
When optimizing your code, you always make educated guesses about your execution time and its relation to some dataset. It's OK to get them wrong. Always ask yourself:
Does my application really have performance issues?
Is spending the time to optimize worth the performance I gain?
Optimizing code means adding complexity; will your team own that decision if you get it wrong?
It's hard to predict if your design is a good one, and it depends on your team's business goals. Just don't spend too much time implementing a complex algorithm if you're not sure whether or not your original application has performance issues. Talk to your teammates to figure this one out.
Let's say that the parallel approach gives you a three-times speedup over the serial approach, but only takes a day to implement as you're reusing the serial code. The concurrent approach might give you a 12- times speedup but requires a six-week rewrite. Your team decides to go with the parallel approach.
Step 4: Implement it!
Using Java's ProcessBuilder, let's launch two programs with different arguments. With this Java class, you can use to run any other program on your operating system. We'll compare running one instance of the program versus two. In this case, they are other JVMs. I'll show you how to do this.
Implementing Your Parallelized Solution: Practice!
First, let's clone the project and see it running as a single process:
As you saw, you can run a single process version of the application using the gradle runSingleProcess
task.
It ran the main class PlanetTemperatureAnalyzer
, which reads a file and calculates two average temperatures in Kelvin:
The average for all sampled temperatures beneath 288 Kelvins, is the average temperature surrounding Earth.
The average for all sampled temperatures above 288 Kelvin
The single process version calculates its averages across a single file with 110,000 planets from the Kepler mission.
Clone the repository and look at the code for averaging. It's just normal sequential code in PlanetTemperatureAnalyzer. Try running it yourself.
Now let's look at the parallelized version, which works on two halves of that dataset of 110,000 records. This calls the same single process version you just saw but with two different files. You end up with several JVM's analyzing a different file at the same time:
As you saw, the main method does little more than call the single process version multiple times. Let's break it down!
PlanetTemperatureAnalyzerParallel
is the main class which gets called by:
gradlew runMultiProcess
The main()
method uses Java's ProcessBuilder
API to define and launch another program on this computer, which is a child process.
What are parent and child processes?
The processes launched by ProcessBuilder are JVMs running the single process PlanetTemperatureAnalyzer, which you just saw. These are child processes.
Obviously, children need parents. The process responsible for spawning this child process is the JVM running PlanetTemperatureAnalyzerParallel
, and it's called the parent process.
Using Process Builder to Create New Child Processes
Let's see how ProcessBuilder
is setup in the main method:
public static void main(String[] args) throws Exception {
// Create a Process builder and make it look like this process
ProcessBuilder builder = new ProcessBuilder();
// Make sure we can see the errors and output of our process
builder.inheritIO();
// Share all environment variables
builder.environment().putAll(System.getenv());
...
}
Line 4: Create a new instance of ProcessBuilder, which lets you configure how the child process will run.
Line 7: First, call
builder.inheritIO()
to make sure that anything printed to the screen by the child process also gets printed by the parent process. This connects all output streams of the process you'll launch to the parent process starting it.Line 10: Then clone the environment to make the child JVM similar to the parents. Do this by calling
builder.enivronment()
to copy all the environment variables from the parent process to the child. An environment variable is a special variable that is set before your program is run and can affect how it runs and starts up.
Creating the Java Commands for Your Child Processes
After configuring the ProcessBuilder, the program then loops through the arguments in args
array of the main() method. These were passed in when starting the program, and contain the two CSV file names passed in by the Gradle task.
Do the following for each of the CSV files:
// Create a command to start our application on this computer
builder.command("java", "-cp", classPathString, PlanetTemperatureAnalyzer.class.getName(), csvFile);
...
Process process = builder.start();
Line 2:
build.command()
is passed arguments to launch a new program, exactly as you would in an OS-X, Linux shell, or Windows cmd. We are doing the equivalent of running the following command for each CSV file.
java -cp /dirs/on/the/class/path com.openclassrooms.concurrency.planetbrain.app.PlanetTemperatureAnalyzer /path/to/csv/file
Line 4: By calling
builder.start()
, you get back an instance of a process for each child process created.
As you saw, there wasn't much involved in doing this.
Waiting for Our Child Processes to Finish
Have a look at the main()
method in PlanetTemperatureAnalyzerParallel. Notice that it stores all the returned process instance in a list, which you later loop through:
// Wait for processes to complete
processes.forEach(process -> {
try {
LOGGER.info("Waiting for process to complete. PID[" + process.pid() + "]");
process.waitFor();
LOGGER.info("Process completed. PID[" + process.pid() + "]");
} catch (InterruptedException e) {
e.printStackTrace();
throw new RuntimeException(e);
}
});
Lines 2: Loop through each of the created child processes.
Line 5: During the loop, call
process.waitFor()
to allow the child process to finish.Line 7:
process.waitFor()
can throw the checked exception, InterruptedException. You must check it by catching. You can also add this to your method signature to propagate it.
Running the Benchmarks!
Clone the repository and look at the code. Have a go at running the tests yourself!
git clone https://github.com/OpenClassrooms-Student-Center/ScaleUpYourCodeWithJavaConcurrencyA.git git checkout p1-c1-multiprocessOr use your IDE to clone the repository and check out the branch p1-c1-multiprocess. You can now explore the examples from the screencast and try running the tests with
gradle test
.
But...did it make our program faster?
There wasn't much difference, was there? Even when estimating scale up, you won't always get it right or notice the difference until you test it with more data. Now, here's an additional challenge:
Try running the program for yourself and make the files larger if you can. See if there's a point where the scale up becomes more notable. This is free play, so have fun! :ninja:
Isn't there a way to automatically test if there is a performance improvement?
Yes! There is a special category of testing known as performance testing or benchmarking. These are tests that execute your code either repeatedly or under heavy load, to measure whether they behave acceptably when repeatedly working hard to process lots of data.
Benchmarking Our Solution
Benchmarks are a great way to execute and measure code. There are several popular frameworks you can use to test the performance of your Java application. We're going to use Java's JMH Microbenchmark, which is integrated with Java as on 1.13, but we need to add it to our 1.11 project.
JMH makes this easy by providing many annotations that you can place within a benchmark test.
Try It Out for Yourself!
Let's see how to create one to compare single process and ProcessBuilder versions:
Checkout the branch p1-c1-microbenchmark.
Have a look at src/test/java/com/openclassrooms/concurrency/planetbrain/BenchmarkRunner.java.
To measure the performance of our code, I've used JMH, which is Java's Microbenchmarking Harness. Use this to run code repeatedly and calculate the average speed of operations. Let's see how to benchmark with JMH and how we added those dependencies to our project:
As you saw, you add two dependencies to build.gradle:
testAnnotationProcessor 'org.openjdk.jmh:jmh-generator-annprocess:1.21'
testImplementation 'org.openjdk.jmh:jmh-core:1.21'
Line 1: jmh-generator-annprocess is an annotation processor that converts
@Benchmark
annotations in code into for-loops, which run an annotated method repeatedly. This measures the average speed of that code.Line 2: jmh-core contains the libraries required to use JMH's classes that JMH requires to run.
Let's look at the code for the benchmarks and run them:
Let's break this down. At the top of the class, you can see the use of several annotations:
public class BenchmarkRunner {
@Benchmark
public void benchmarkParallelStream() {
PlanetAnalyzerUsingParallelStreams.analyzeFile(ALL_PLANETS_FILE);
}
}
Line 3: The
@Benchmark
annotation marks a method as one you want to sample multiple times according to the previously specified rules. We will call the analyzeFile static method multiple times.
Now run the the Gradle task runBenchmarks:
./gradlew runBenchmarks
You should see something like the following:
# Run complete. Total time: 00:16:55
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
Benchmark Mode Cnt Score Error Units
BenchmarkRunner.benchmarkMultiProcess thrpt 25 2.206 ± 0.131 ops/s
BenchmarkRunner.benchmarkSingleProcess thrpt 25 22.474 ± 0.536 ops/s
As you can see in Line 10 and 11, there is a table with a score for benchmarkMultiProcess and benchmarkSingleProcess. Across 25 iterations of multiple calls, multi-process is almost 25 times slower than a single process, taking 16 minutes to complete.
The reason for this is that starting a new JVM process - like with ProcessBuilder - can itself be slow, mainly if you are doing it repeatedly! It reminds you to measure.
If multi-process programs can be slow, what are my other options?
As you'll see in the following chapters, Java has many other techniques for writing concurrent and parallel code without having to start a new process. You'll even learn how to work safely in code with critical sections like the one above.
Let's Recap!
Use the Java stream API to chain together operations.
Execute these operations in parallel within separate JVM processes using ProcessBuilder.
Use JMH Microbenchmark to check how much speed you've gained.
In the next chapter, you'll learn how to use the Stream API to parallelize operations on a stream within a single JVM process.