Modifying an ArrayList
Now that you've seen how to write a program with a thread-safe map, what do you do if you need a thread-safe ArrayList?
Eh...do we really need that thread-safety in this case?
OK, let's see what happens when you try to modify an ArrayList that you're reading from:
Step 1: Open up JShell:
jshell
Step 2: Now create an ArrayList called words
:
jshell> ArrayList<String> words = new ArrayList<>()
words ==> []
Step 3: Put some words into that list. We'll use Java's Arrays.asList()
for this, which returns an immutable list.
jshell> words.addAll(Arrays.asList("Space", "the", "final", "frontier"))
$34 ==> true
Given that we want to change the list, I'm going to pass it to words.addAll()
, which will copy all the elements into our list of words.
Step 4: Now let's loop through that list and try to add a value to it at the same time:
jshell> for (String word : words) {
...> words.add("A new word");
...> }
While looping through the array, try to modify it. What you should find is that you get a ConcurrentModificationException:
jshell> for (String word : words) {
...> words.add("A new word");
...> }
| Exception java.util.ConcurrentModificationException
| at ArrayList$Itr.checkForComodification (ArrayList.java:1042)
| at ArrayList$Itr.next (ArrayList.java:996)
| at (#35:1)
Java won't let you modify a list if you are already looping through it. Whether it's one thread modifying a list while looping or many, the second some code tries to modify the list, Java throws a ConcurrentModificationException
. It protects you from yourself. 🙃
Consider the following code. It isn't very exciting in that it calculates an average for the values in a list. To do this, it loops through it and adds the values together. It also counts them:
Double calculateAverage(List<Double> samples) {
Double sampleCount = 0.0;
Double sum = 0.0;
for (Double sample: samples) {
sum += sample;
sampleCount++;
}
return sum/sampleCount;
}
Imagine what would happen if you were in the middle of Lines 4 to 7, and another thread removed values from the list or modified them! 😱 If you read a list with those changes, your result wouldn't mean very much. You might have included a number that was no longer in the list or reflect a number that changed while you were reading it. Even if the list changed, your calling code would want to make a meaningful value it could use, which reflected all the values in the list at the time you called the method.
Java will, therefore, protect you by throwing a ConcurrentModificationException when you're looping through a list. Looping through a list involves using an iterator, which is the same as calling iterator() on a list directly. Since this iterator protects you from yourself, it is known as a fail-fast iterator.
Using Fail-Safe Iterators
Both CopyOnWriteArrayList
and ConcurrentHashMap provide fail-safe iterators. They are fail-safe because they just won't let you mess things up even if you perform lots of concurrent mutations. That is, unlike the iterator provided by a regular ArrayList, the iterator returned by these classes allow you to modify a list all you want, while as many threads as you want are reading from it. Even better, the code reading from the iterator won't have to deal with inconsistencies.
What kind of magic is this?
Fail-safe iterators are called weakly consistent, meaning that they reflect the last consistent state of a collection when you started reading from it.
For instance, imagine the following behavior for the following shared CopyOnWriteArrayList.
Create a CopyOnWriteArrayList called
cows
. It contains Betty and Bessy. Now create two threads that can use it. Iterating on a CopyOnWriteList creates a local copy for each iterator.Thread 2 starts off seeing both values on the shared list.
Thread 1 also sees the same values on the shared and unmodified list.
Thread 1 starts to loop through the array and is guaranteed to only get back the current items in cows (i.e., Betty and Bessy).
Thread 2 adds Bertha to the end of the cows. This is reflected in the shared list of cows. Thread 1 can see this by calling cows.get(2), but you haven't told Thread 1 that there's another element. So its iterator won't take this into account, and it'll just iterate through the original list of two items.
Even if Thread 1 adds Milly to the list of cows, it won't be reflected. The iterator continues to use a copy of the list, as of the time at which you started looping. So, even if Thread 1 adds Milly to the list, it is reflected by a value for cow returned by its current iterator.
Try out CopyOnWriteArrayLists in JShell!
We're going to create a CopyOnWriteArrayList called source
, and put some data in it. We'll then start looping through that source
data to copy it into another list called target
. While copying that data, we'll also add lots of new data to the source
- at the same time. What do you think will happen?
Step 1: Start up JShell:
jshell
Step 2: Create a CopyOnWriteArrayList of type integer:
jshell> CopyOnWriteArrayList<Integer> source = new CopyOnWriteArrayList<>()
source ==> []
Step 3: Add the number 7 in there. Lucky for some:
jshell> source.add(7)
$78 ==> true
If you type source, you should see that it only has one value.
Step 4: Let's create a second list which we'll copy all of this data into. That is the one value, 7.
jshell> CopyOnWriteArrayList<Integer> target = new CopyOnWriteArrayList<>()
target ==> []
Step 5: Create a thread that loops over source and copies each value into data. For each value in source, it also adds another 100 numbers to target - changing the source list.
jshell> CompletableFuture.runAsync(()-> source.forEach((n)->{ IntStream.range(100,200).forEach(m->source.add(m)); target.add(n); })
...> )
$82 ==> java.util.concurrent.CompletableFuture@79be0360[Not completed]
Pass a lambda to CompletableFuture.runAsync(). Let's break down the lambda's behavior:
source.forEach( (n) -> {...})
creates an iterator from the list source and starts looping through its data. In this case, the only data in there at this time is the value 7.IntStream.range(100, 200).forEach( m-> source.add(m) );
adds every number from 100 to 199 into the source list. This runs once for each value considered in our outer.forEach()
loop. So that's once.target.add(n);
runs for each value n provided by the iterator of source. Since there is only one value in source, this at least copies the number 7. But what about all those other values we added in Step 2?
Did you notice how there isn't a ConcurrentModificationException?
Step 6: Let's look at source:
jshell> source
source ==> [7, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199]
It's got all one hundred numbers, as well as the original 7.
Step 7: What do you think will be in target? Let's check!
jshell> target
target ==> [7]
Target only contains the number 7. The reason for this is that the iterator we used was created from a list that only had the value 7 in it.
Once you start looping over the iterator of CopyOnWriteArrayList, you're guaranteed a consistent state reflecting the point at which you started looping. It results in safer and more predictable software and fewer bugs.
Even better, since you're not synchronizing to achieve this, it means less blocking, which means more efficient code! But only if your design calls for a shared mutable array list!
Updating Our Planet File Analyzer: Practice!
How might you use a CopyOnWriteArrayList in the planet file analyzer application?
Your first thought might be to put all the sample temperatures into a shared array, as each thread reads from the files. It would be easy, as you could then do something like:
CopyOnWriteArrayList<Double> samples = new CopyOnWriteArrayList()
...
// Threads can add temperature values to samples
...
// Finally we calculate an average in the parent thread
for(Double sample : samples) {
sumOfSample += sample;
}
Double average = sumOfSamples/samples.size();
Each thread writes values to a list so that you can add them up sequentially later.
It's always tempting to force your code out of shape to fit the tool you want to use. While this lets you efficiently use the CopyOnWriteArrayList from all your threads, you should take a closer look at Lines 6 to 8 as it has to visit every item in the samples list, sequentially, before you can complete your calculations. Further, every temperature in the file has to get stored there. It pushes up your memory requirements, so try to avoid storing too much. You could optimize this further, but unless you're demonstrating as we are here, ask yourself if this is the best tool for the job.
As the total volume of data grows, it becomes slower and slower. You have less code benefiting from concurrency and possibly worsened, as there is a thread-safe data structure that has its own performance costs.
Instead, keep that CopyOnWriteArrayList small and add the sum of planetary temperatures calculated for each file in there. So if you have 72 files, you should only have 72 entries in your array.
Let me show you such an implementation:
As you saw, each file's contribution is to add an AveragingComponent to the shared CopyOnWriteArrayList. The main streaming loop is:
// For every temperature in our file
Optional<AveragingComponents> components = getDoublesFromFile(file)
...
// Add the result to our SHARED samples array or use a 0.0
samples.add(
components.orElse(new AveragingComponents(0.0, 0.0)));
Line 2: Put a total sum of temperatures and a count of samples into components.
Line 5: Add this single object to the shared CopyOnWriteArrayList for each file analyzed.
In the main thread, add all of these together:
for (AveragingComponents sample : temperatureSamples) {
totalTemperature += sample.getSampleSum();
sampleCount += sample.getSampleSize();
}
return totalTemperature/sampleCount;
While this seems similar to the previous example, remember that since there is now one value per file, we've kept most of the work in the concurrent thread which no longer represents looping through every temperature in the file. Instead, add the totals for each file.
It works for the example code where you have less than 70 files; however, if you have hundreds or even millions, you'd probably pick a different design. Or perhaps not even use a CopyOnWriteArrayList in the first place!
Always pick the right tool for the job. You've now seen many, and there are even more. Java gives you a whole toolbelt. But based on your understanding of this data structure, you must pick the right concurrency constructs for the problem you're investigating. And don't forget to keep measuring and proving your assumptions about which design is best!
Let's Recap!
ArrayLists provide a fail-fast iterator that prevents one or more threads from looping through and changing a list at the same time.
While some part of your code has started reading from the iterator of an ArrayList, any subsequent changes to that ArrayList result in a ConcurrentModificationException.
CopyOnWriteArrayLists provide thread safety by providing a fail-safe iterator.
Threads looping through and reading the values of a CopyOnWriteArrayList are guaranteed to see the list as it was when they created their iterator.
Threads that change a CopyOnWriteArrayList make synchronized changes to the shared array, but this does not impact those threads reading from it.
In the next chapter, you'll find out about a powerful and modern tool in Java concurrency: reactive streams!