Categories
Design Java

Introducing the Java Concurrency APIs

I recently had the privilege of helping one of my clients conduct technical interviews for a Senior Java Developer position. I enjoy interviewing candidates in part because it gives me a chance to learn something. While conducting the interviews I did learn a few things, so I’m happy 🙂 However, one unfortunate thing I learned is that not many developers are familiar with the Java Concurrency APIs. Only one developer out of about 20 that we interviewed had used it, and only two or three had even read up on it. Now, I certainly don’t expect that everyone know every API out there. But in the case of this particular position, we needed someone who knew how to write multi-threaded applications correctly. There are many reasons to get up to speed with the APIs, especially if your applications operate in a multi-threaded environment as my client’s does.

So – I want to share some data that may motivate you to take a further look into the Concurrent APIs if you haven’t done so already. But first let’s take a 30,000 foot view of the APIs. (If you’re really impatient, skip down to the graphs, then come back here to find out what they mean.)

Overview of the Concurrency APIs

The java.util.concurrent package was introduced via JSR-166 and is present as a standard part of the Java libraries as of Java 5. However, for those unfortunate souls still on JDK versions 1.3 and 1.4 there are at least two mainstream back-port implementations of the concurrent package:

This refutes the notion that the Concurrency APIs are only for those whose product requires Java 5 or above.

The Concurrency APIs offer many different interfaces and classes to help with your multi-threaded application. So we’ll focus on one specific subset of the Concurrency APIs found in the java.util.concurrent.lock package: the Lock interface. An object of type Lock is an object representation of a lock in the JVM, similar to the synchronization primitive offered by the synchronized keyword. Rather than use a specific keyword, the Lock object represents the lock using class(es) and methods on which the lock is enabled, disabled, and so forth. The core Lock API contains a handful of methods, but the two most important are:

public interface Lock {

    public void lock();

    public void unlock();

}

These are the main methods used to perform — you guessed it! — locking and unlocking. OK, admittedly, the Lock interface doesn’t look convincing enough to run out and switch all your tried-and-true-and-tested code over to the Lock API. But let’s look at another interface in the java.util.concurrent.locks package: ReadWriteLock. Here is the interface in its entirety:

public interface ReadWriteLock {

    public Lock readLock();

    public Lock writeLock();

}

Wow – pretty short! But as you’ve probably guessed, the ReadWriteLock API is a step toward implementing the read-write locking pattern in your code. Let’s note the JavaDoc for the interface for what this pattern can achieve:

A ReadWriteLock maintains a pair of associated locks, one for read-only operations and one for writing. The read lock may be held simultaneously by multiple reader threads, so long as there are no writers. The write lock is exclusive.

So if the usage pattern of a particular piece of synchronized data is read-mostly, it’s possible to have more than one reader thread in the critical section. But when a write does occur, all other readers (and writers) will block until the current writer is finished. But unless a write is being performed, the code can achieve a higher degree of parallelism.

Usage of the Concurrency APIs

To see how straightforward it is to use the Lock as a replacement for the synchronized keyword, let’s use an example. My canonical example is, of course, a cache:

public interface Cache<K, V> {

    public void put(K key, V value);

    public V get(K key);

}

Let’s look at the pre-Concurrency API implementation that uses “classic” synchronization:

public class ClassicallySynchronizedCache<K, V> implements Cache<K, V> {

    private Map<K, V> cache = new HashMap<K, V>();

    public void put(K key, V value) {
        synchronized (cache) {
            cache.put(key, value);
        }
    }

    public V get(K key) {
        synchronized (cache) {
            return cache.get(key);
        }
    }

}

Pretty straightforward. Our cache allows for insertions and retrievals and uses synchronization to ensure that we don’t have any concurrency issues internal to our java.util.HashMap. Now let’s see how the code can be revised using the Lock API:

public class LockCache<K, V> implements Cache<K, V> {

    private Map<K, V> cache = new HashMap<K, V>();

    private Lock lock = new ReentrantLock();

    public void put(K key, V value) {
        try {
            lock.lock();
            cache.put(key, value);
        } finally {
            lock.unlock();
        }
    }

    public V get(K key) {
        try {
            lock.lock();
            return cache.get(key);
        } finally {
            lock.unlock();
        }
    }

}

As was mentioned, there are no longer blocks of code demarcated by the synchronized keyword. Instead, we have replaced that with a Lock object upon which we lock and unlock around our critical section.

Now let’s implement our Cache using the ReadWriteLock interface:

public class ReadWriteLockCache<K, V> implements Cache<K, V> {

    private Map<K, V> cache = new HashMap<K, V>();

    private ReadWriteLock lock = new ReentrantReadWriteLock();

    public void put(K key, V value) {
        try {
            lock.writeLock().lock();
            cache.put(key, value);
        } finally {
            lock.writeLock().unlock();
        }
    }

    public V get(K key) {
        try {
            lock.readLock().lock();
            return cache.get(key);
        } finally {
            lock.readLock().unlock();
        }
    }

}

The natural progression of the code is to separate the locks used to manage the reads/gets and writes/puts. These are both managed by the ReadWriteLock implementation to ensure that the locks coordinate and function as defined.

Results

But – as they say – the proof is in the pudding. Does all of this actually have any benefit for parallelization? Let’s take a comparative look of these three implementations running under different scenarios.

These scenarios usage different usage patterns to show how the different locking schemes affect parallelization. These results are generated using a benchmark tool I wrote to test out different locking schemes. The benchmark tests each of the three above styles of locking in a parameterized fashion so as to approximate a specific usage pattern. The test machine is a Core 2 Duo E6700 with 2 GB of RAM running Linux Fedora with Java 1.6.0_01. All listed times are expressed in milliseconds.

Scenario 1: Read-mostly, Fast Reads, Infrequent Writes

The first locking scenario is characterized by:

  • Reading threads outnumber writing threads 4:1
  • Read threads each perform 10,000,000 reads from the cache
  • Read lock duration short (reading shared data takes near-zero time)
  • Write lock duration medium (writing shared data occurs every second and takes 10 milliseconds)

Here we see that simply switching to a Lock-based locking scheme improves performance by 5x over the old-style synchronized approach even though the degree of parallelization is the same (only one reader or writer in the critical section concurrently). Interestingly, we note that while the ReadWriteLock does offer better parallelization (via more than one reader in the critical section at a time), the overhead of the internal locking algorithms is enough to degrade performance to where speed wins out over parallelization.

Scenario 2: Read-mostly, Slow Reads, Infrequent Writes

The second locking scenario is characterized by:

  • Reading threads outnumber writing threads 4:1
  • Read threads each perform 100 reads from the cache
  • Read lock duration long (reading shared data takes one millisecond)
  • Write lock duration medium (writing shared data occurs every second and takes 10 milliseconds)

Here we see that simply switching to a Lock-based locking scheme does not improve performance over the old-style synchronized approach. However, just look at how much more parallelization we achieve by taking advantage of the ReadWriteLock mechanism — performance increases by nearly 40x! We achieve better parallelization by allowing more than one reader in the critical section at a time. Since the reads are comparatively slow, the overhead of the internal locking algorithms does not adversely affect performance. On the other hand, the other two implementations begin blocking most severely.

Scenario 3: Read-mostly, Fast Reads, Frequent Writes

The third locking scenario is characterized by:

  • Reading threads outnumber writing threads 4:1
  • Read threads each perform 100,000,000 reads from the cache
  • Read lock duration short (reading shared data takes near-zero time)
  • Write lock duration medium (writing shared data occurs every 10 milliseconds and takes 10 milliseconds)

Again, simply switching to a Lock-based locking scheme improves performance by over 4x over the old-style synchronized approach with the same degree of parallelization. Interestingly, we again notice the overhead of the ReadWriteLock internal locking algorithm’s affect on performance, though still over 2x that of the synchronized approach.

Scenario 4: Equal Read and Write Threads, Fast Reads, Infrequent Writes

The last locking scenario is characterized by:

  • Reading threads and writing threads 1:1
  • Read threads each perform 10,000,000 reads from the cache
  • Read lock duration short (reading shared data takes near-zero time)
  • Write lock duration medium (writing shared data occurs every second and takes 10 milliseconds)

Yet again, simply switching to a Lock-based locking scheme improves performance by over 4x over the old-style synchronized approach with the same degree of parallelization. Interestingly, we again notice the overhead of the ReadWriteLock internal locking algorithm’s affect on performance, though the increased parallelization yields a nearly 3x performance improvement over that of the synchronized approach.

Conclusion

So what can we conclude from this? That we should immediately switch to using the Lock and/or ReadWriteLock APIs? No.

The point is:

No one locking mechanism will consistently perform better than another. Review your usage patterns, the locking mechanism they use, and test performance. Use the right tool for the job.

As usual, there are several factors involved with a given usage pattern:

  • Number of concurrent reader threads attempting to access the shared data structure
  • Time needed to read a value from the shared data structure
  • Number of concurrent writer threads attempting to modify the shared data structure
  • Time needed to write a value to the shared data structure
  • Time between write accesses

Of course, this list isn’t exhaustive, but it gives us an idea what factors come into play.

Remember too that the boxes that run our applications are being imbued with the ability to do more in parallel via multiple CPUs and N-core CPUs. A noticed trend is for CPU designers to rely more on the number of cores rather than raw performance of each core itself. That is, the number of cores may increase faster than the speed of each core. It behooves developers to ensure they’re using the appropriate amount of parallelization.

So – take a look at the Concurrency APIs if you haven’t already. There are many more gems in it that make it worth your while.