September 4, 2008

Configuring MySQL to Run in EC2's EBS under a Fedora AMI

Posted in Configuration at 10:45 PM by Kirk True

There’s a lot of buzz around Amazon’s recent announcement/availability of their EBS add-on for EC2. The main reason being that it gets around the deficiency of EC2 wherein you lose all your locally-stored data if someone or something crashes your EC2 instance.

Certainly, one of the main uses of EBS will be to persistently host a database server. There have been some tutorials as to how to set up MySQL to use an EBS share. One that I especially appreciated was Eric Hammond’s tutorial. Unfortunately, it didn’t work out-of-the-box for me when running a Fedora 8-based AMI rather than an Ubuntu-based AMI the tutorial uses.

Since I had to tweak a few things to get it to work, I thought I’d share the solution in case you run into the same problem. It’s also good to document it for when my memory fails me ;)

The following is an addendum to the above mentioned tutorial that I used to get MySQL working on an EBS share using Fedora 8. For brevity I refer you to the tutorial for all steps apart from the section titled “Configuring MySQL to use the EBS volume”. Specifically, I used the AMI “ami-2b5fba42″ image that is a Fedora 8 image that doesn’t include MySQL. I installed MySQL (5.x) on the system using yum.

In actuality, this probably applies more broadly than EC2 and rather describes some tweaks one would need to make to get MySQL running from a non-standard location on Fedora.

  1. First, stop the MySQL instance if it’s running: # /etc/init.d/mysql stop
  2. Next, create the paths for MySQL: # mkdir -p /vol/lib /vol/log
  3. Then mv /var/lib/mysql /vol/lib/ && mv /var/log/mysql /vol/log/
  4. Open your MySQL configuration file (mine was at /etc/my.cnf) and replace all instances of "/var/" with "/vol/" as /vol is used as the new MySQL home in the rest of the tutorial. (/vol is mounted against the EBS-backed device.)
  5. Open your MySQL startup script (mine was at /etc/init.d/mysqld) and replace all instances of "/var/" with "/vol/" as per the rest of the tutorial
  6. In the MySQL startup script, find the line that calls # /usr/bin/mysqladmin and add the 'socket file' command line option to point to our non-standard socket file ($socketfile): /usr/bin/mysqladmin -S$socketfile
  7. Next, start the MySQL instance running: # /etc/init.d/mysql start

Note that from this point you have to call most--if not all--MySQL command line utilities with the socket file argument. This file is specified in your MySQL configuration. If you forget, the command line will give you an error that should remind you. For example...

# mysql -uroot

...yields...

ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)

The biggest change needed was to update the references in /etc/init.d/mysqld to point to /vol and add the explicit socket file argument.

August 22, 2008

Setting up Tomcat Development in Eclipse

Posted in Java at 4:47 PM by Kirk True

In this post I’ll outline how to get the Tomcat source code up and running in Eclipse. This is the opposite of what most people want to do: debug web applications using Tomcat in Eclipse. This is about downloading the source code, building, and debugging Tomcat itself.

We’ll be using Linux as the development environment, but any sensible environment should work with a few tweaks. I’m also using the latest version of the 1.6 JDK.

Getting the Source

The first thing you’ll need to do is to download the sources for Tomcat. There are a couple of ways to do this, but I’m going to use the up-to-date SVN repository for the trunk. So, open a shell and enter the following:

mkdir /tmp/tomcat
cd /tmp/tomcat/
svn co http://svn.apache.org/repos/asf/tomcat/trunk
cd trunk/
echo "base.path=/tmp/tomcat/trunk/downloads" > build.properties
ant download
ant
ant -f extras.xml

Creating the Eclipse Project

Now we turn our attention to setting up an Eclipse project that can build and execute Tomcat.

First, open Eclipse and start with the workspace of “/tmp/tomcat/trunk”. Create a new project using the “Java Project from Existing Ant Buildfile” option. Browse to the file “”/tmp/tomcat/trunk/build.xml” and use the “compile” target, then click “Finish”.

Immediately you’ll see hundreds of errors – don’t panic! We’ll fix these up by updating the classpath. View the project options for our new project and select to edit the “Java Build Path”. We’ll need to remove the “/tmp/tomcat/trunk/${ant.jar}” as this points to nothing (yet). Also remove the “JRE_LIB” entry as this is deprecated (so, why did Eclipse create it???).

To fix up the classpath, do the following:

  1. Click “Add External JARs” and add the JAR found at $ANT_HOME/lib/ant.jar
  2. Click “Add External JARs” and add the two JARs in /tmp/tomcat/trunk/output/extras/webservices
  3. Click “Add Library”, select “JRE System Library”, click “Next”, and then select the default (1.5/1.6) JDK and click “Finish”.

After closing the project properties, the workspace should build and — hopefully — there will be no more compilation/dependency errors.

Running Tomcat

Let’s now get Eclipse to run Tomcat from our source:

  1. Right click on the project, select “Run As…”
  2. Create a new “Java Application” configuration with a main class of “org.apache.catalina.startup.Bootstrap”
  3. Click on the “Arguments” tab and enter the “Program arguments” of “start”
  4. Select a “Working directory” of “Other” with a value of “${workspace_loc:Tomcat 6.0}/../output/build”
  5. Click “Run”

To verify that everything is working, point your browser to:

http://localhost:8080/examples/servlets

Hopefully this will help you set up your project in Eclipse to start hacking on the source code to Tomcat.

April 4, 2008

ByteBuffer.duplicate() Does Not Preserve Byte Order

Posted in Java at 10:29 AM by Kirk True

I’ve just spent the last two hours pulling out my hair wondering why my ByteBuffer duplication code doesn’t work.

I’m having fun with a side project to create a lightweight CIFS server in Java. Since the CIFS protocol is in little endian byte order, I have to explicitly specify the byte order via the order API:

public static ByteBuffer allocate(int length) {
	ByteBuffer byteBuffer = ByteBuffer.allocate(length);
	byteBuffer.order(ByteOrder.LITTLE_ENDIAN);
	return byteBuffer;
}

Having a little utility method such as the above makes it easy to stop worrying about the byte ordering.

Many CIFS requests contain two sections – one section for request parameters and one section for request data. This seemed like a good place to use the duplicate API to split the one ByteBuffer into two “views”. (Note, this still uses the underlying buffer so you’re not making extra copies.)

However, my first pass at the code contained a subtle, but nasty bug. Suddenly values in the duplicated buffer were coming out with crazy values but the original buffer worked fine. And apparently I’m not the only one who’s been tripped up by it. A bug was filed over five years ago against this very API call, noting that the ordering is not preserved from the original to the duplicate. My favorite comment from the Java Bug Parade entry is:

No rightminded developer wants his duplicated ByteBuffer to behave differently than the original.

I would agree ;)

Fortunately this is easy to fix, and creating another utility method for it makes it even clearer:

public static ByteBuffer duplicate(ByteBuffer buffer) {
	ByteBuffer duplicateBuffer = (ByteBuffer)buffer.duplicate();
	duplicateBuffer.order(buffer.order());       // duplicate() does not preserve ordering!
	return duplicateBuffer;
}

This little method then preserves my ByteOrder.LITTLE_ENDIAN setting to avoid future problems. Plus, I can hide away the pre-Generics cast from Buffer to ByteBuffer in one place ;)

March 7, 2008

If You're Installing Fedora 8 on a Dell Inspiron 530S

Posted in Linux, Miscellaneous at 8:11 AM by Kirk True

I recently bought four machines for a test lab I’m configuring. Each is a Dell Inspiron 530S that comes without an operating system. (Well, technically it comes with FreeDOS.) So I set about to install Fedora 8 on it. If you’re installing Fedora 8 on a Dell Inspiron 530S, here are some tips for you:

Tip 1: Add “irqpoll” on installation

For whatever reason, running the Fedora installer using all the defaults does not work. The installer would get stuck when loading the SCSI drivers, even though that was apparently not the issue.

So, at the installer, press the tab key. This will allow you to modify the kernel command line parameters. To the end of the kernel command line parameters, type “irqpoll” (without the quotes, of course). Then hit enter. This allowed the installer to recognize everything properly and installed Fedora.

Tip 2: Have a copy of the e1000 NIC drivers handy

The Inspiron 530S comes with an on-board 10/100 NIC. Unfortunately the version of the e1000 driver that ships with Fedora 8 (and Ubuntu 7.04) is outdated and does not recognize the NIC. The official response from Dell is to “Install [the] new e1000 driver.” Unfortunately, you have to have the driver handy on a physical medium (such as a CD/DVD or a USB device). Here are the steps that worked for me:

  1. On another machine, go to http://sourceforge.net/projects/e1000 and download the latest version of the “e1000 stable” driver. As of this writing, the latest version is 7.6.5.
  2. Copy the Tar file to your physical media. For me, the easiest thing to do was burn it to a CD.
  3. On the Inspiron 530S, mount your physical media.
  4. Untar/zip the tar file: tar xzf e1000-.tar.gz
  5. cd e1000-/src
  6. make && make install && modprobe e1000
  7. You’ll then need to add and activate the network card using either the GUI tools or the command line.

Other than that it was a smooth ride…

January 11, 2008

A Quick Catch-all Comparator Closure for Groovy

Posted in Design, Groovy at 8:34 AM by Kirk True

I don’t come from a functional programming background. As a result, the concepts of closures, currying, and so forth that are present in Groovy are taking some getting used to. But that’s half the fun ;) And without even trying, I was able to come up with a practical use for both: a generic, catch-all comparator for sorting java.util.List instances.

First, let’s imagine we have a standard POGO (Plain Old Groovy Object) named User:

class User {

    String name

    String emailAddress

    Date dateOfBirth

    int weight

    String toString() {
        name
    }

}

Pretty standard fare, right? As often happens with many POJOs/POGOs, our class ends up being stored in instances of java.util.List:

def frank  = new User(name:"Frank Thompson", dateOfBirth: new Date(75, 0, 11), weight:150)
def bob    = new User(name:"Bob Turner", dateOfBirth: new Date(67, 4, 9), weight:115)
def james  = new User(name:"James Van Klein", dateOfBirth: new Date(58, 11, 13), weight:175)
def brad   = new User(name:"Brad Franklin", dateOfBirth: new Date(105, 7, 23), weight:25)

def userList = [frank, bob, james, brad]

And, as often happens with lists, they end up begging to be sorted.

The problem I found myself with was a desire avoid having to write N java.util.Comparator implementations “by hand” in order to sort a list by N different attributes of a class. That is, I know that most Comparator implementations in Java end up looking more or less like this fictitious implementation for our User‘s name attribute:

public class UserNameComparator implements Comparator<User> {

    public int compare(User a, User b) {
        def aValue = a.getName();
        def bValue = b.getName();

        if (!aValue && !bValue)
            return 0;
        else if (!aValue)
            return -1;
        else if (!bValue)
            return 1;
        else
            return aValue.compareTo(bValue);
    }

}

Having to duplicate that same logic for the emailAddress, dateOfBirth, weight, and so on — just for this one little class — seems overkill. And, it’s inevitable that many other classes would need to do the same thing…

This seemed like a perfect chance to use some of Groovy’s dynamic qualities to achieve a generic, reusable Comparator. Using Groovy’s closures and currying, I was able to come up with a simple sort closure that allowed comparisons on arbitrary attributes of a class:

def sortClosure = { attribute a, b ->
    def aValue = a.@"${attribute}"
    def bValue = b.@"${attribute}"

    if (!aValue && !bValue)
        return 0
    else if (!aValue)
        return -1
    else if (!bValue)
        return 1
    else
        return aValue.compareTo(bValue)
}

Here we take the attribute which is the name of the class’ instance variable/property/attribute to sort upon. a and b are the two instances of whichever class we happen to be sorting. We simply “curry” the name of the attribute, then pass the curried closure to the standard sort method of java.util.List:

println "Sorted by name: ${userList.sort(sortClosure.curry('name'))}"
println "Sorted by email address: ${userList.sort(sortClosure.curry('emailAddress'))}"
println "Sorted by dob: ${userList.sort(sortClosure.curry('dateOfBirth'))}"
println "Sorted by weight: ${userList.sort(sortClosure.curry('weight'))}"

And the best part is that this closure can be used to sort on any exposed attribute of any class.

So there’s a quick examples that shows closures and currying in practice. But of course there are a lot of very practical uses for both closures and currying beyond that, many which are less trivial and even more time saving.

July 5, 2007

Introducing the Java Concurrency APIs

Posted in Design, Java at 4:13 PM by Kirk True

I recently had the privilege of helping one of my clients conduct technical interviews for a Senior Java Developer position. I enjoy interviewing candidates in part because it gives me a chance to learn something. While conducting the interviews I did learn a few things, so I’m happy :) However, one unfortunate thing I learned is that not many developers are familiar with the Java Concurrency APIs. Only one developer out of about 20 that we interviewed had used it, and only two or three had even read up on it. Now, I certainly don’t expect that everyone know every API out there. But in the case of this particular position, we needed someone who knew how to write multi-threaded applications correctly. There are many reasons to get up to speed with the APIs, especially if your applications operate in a multi-threaded environment as my client’s does.

So – I want to share some data that may motivate you to take a further look into the Concurrent APIs if you haven’t done so already. But first let’s take a 30,000 foot view of the APIs. (If you’re really impatient, skip down to the graphs, then come back here to find out what they mean.)

Overview of the Concurrency APIs

The java.util.concurrent package was introduced via JSR-166 and is present as a standard part of the Java libraries as of Java 5. However, for those unfortunate souls still on JDK versions 1.3 and 1.4 there are at least two mainstream back-port implementations of the concurrent package:

This refutes the notion that the Concurrency APIs are only for those whose product requires Java 5 or above.

The Concurrency APIs offer many different interfaces and classes to help with your multi-threaded application. So we’ll focus on one specific subset of the Concurrency APIs found in the java.util.concurrent.lock package: the Lock interface. An object of type Lock is an object representation of a lock in the JVM, similar to the synchronization primitive offered by the synchronized keyword. Rather than use a specific keyword, the Lock object represents the lock using class(es) and methods on which the lock is enabled, disabled, and so forth. The core Lock API contains a handful of methods, but the two most important are:

public interface Lock {

    public void lock();

    public void unlock();

}

These are the main methods used to perform — you guessed it! — locking and unlocking. OK, admittedly, the Lock interface doesn’t look convincing enough to run out and switch all your tried-and-true-and-tested code over to the Lock API. But let’s look at another interface in the java.util.concurrent.locks package: ReadWriteLock. Here is the interface in its entirety:

public interface ReadWriteLock {

    public Lock readLock();

    public Lock writeLock();

}

Wow – pretty short! But as you’ve probably guessed, the ReadWriteLock API is a step toward implementing the read-write locking pattern in your code. Let’s note the JavaDoc for the interface for what this pattern can achieve:

A ReadWriteLock maintains a pair of associated locks, one for read-only operations and one for writing. The read lock may be held simultaneously by multiple reader threads, so long as there are no writers. The write lock is exclusive.

So if the usage pattern of a particular piece of synchronized data is read-mostly, it’s possible to have more than one reader thread in the critical section. But when a write does occur, all other readers (and writers) will block until the current writer is finished. But unless a write is being performed, the code can achieve a higher degree of parallelism.

Usage of the Concurrency APIs

To see how straightforward it is to use the Lock as a replacement for the synchronized keyword, let’s use an example. My canonical example is, of course, a cache:

public interface Cache<K, V> {

    public void put(K key, V value);

    public V get(K key);

}

Let’s look at the pre-Concurrency API implementation that uses “classic” synchronization:

public class ClassicallySynchronizedCache<K, V> implements Cache<K, V> {

    private Map<K, V> cache = new HashMap<K, V>();

    public void put(K key, V value) {
        synchronized (cache) {
            cache.put(key, value);
        }
    }

    public V get(K key) {
        synchronized (cache) {
            return cache.get(key);
        }
    }

}

Pretty straightforward. Our cache allows for insertions and retrievals and uses synchronization to ensure that we don’t have any concurrency issues internal to our java.util.HashMap. Now let’s see how the code can be revised using the Lock API:

public class LockCache<K, V> implements Cache<K, V> {

    private Map<K, V> cache = new HashMap<K, V>();

    private Lock lock = new ReentrantLock();

    public void put(K key, V value) {
        try {
            lock.lock();
            cache.put(key, value);
        } finally {
            lock.unlock();
        }
    }

    public V get(K key) {
        try {
            lock.lock();
            return cache.get(key);
        } finally {
            lock.unlock();
        }
    }

}

As was mentioned, there are no longer blocks of code demarcated by the synchronized keyword. Instead, we have replaced that with a Lock object upon which we lock and unlock around our critical section.

Now let’s implement our Cache using the ReadWriteLock interface:

public class ReadWriteLockCache<K, V> implements Cache<K, V> {

    private Map<K, V> cache = new HashMap<K, V>();

    private ReadWriteLock lock = new ReentrantReadWriteLock();

    public void put(K key, V value) {
        try {
            lock.writeLock().lock();
            cache.put(key, value);
        } finally {
            lock.writeLock().unlock();
        }
    }

    public V get(K key) {
        try {
            lock.readLock().lock();
            return cache.get(key);
        } finally {
            lock.readLock().unlock();
        }
    }

}

The natural progression of the code is to separate the locks used to manage the reads/gets and writes/puts. These are both managed by the ReadWriteLock implementation to ensure that the locks coordinate and function as defined.

Results

But – as they say – the proof is in the pudding. Does all of this actually have any benefit for parallelization? Let’s take a comparative look of these three implementations running under different scenarios.

These scenarios usage different usage patterns to show how the different locking schemes affect parallelization. These results are generated using a benchmark tool I wrote to test out different locking schemes. The benchmark tests each of the three above styles of locking in a parameterized fashion so as to approximate a specific usage pattern. The test machine is a Core 2 Duo E6700 with 2 GB of RAM running Linux Fedora with Java 1.6.0_01. All listed times are expressed in milliseconds.

Scenario 1: Read-mostly, Fast Reads, Infrequent Writes

The first locking scenario is characterized by:

  • Reading threads outnumber writing threads 4:1
  • Read threads each perform 10,000,000 reads from the cache
  • Read lock duration short (reading shared data takes near-zero time)
  • Write lock duration medium (writing shared data occurs every second and takes 10 milliseconds)

Here we see that simply switching to a Lock-based locking scheme improves performance by 5x over the old-style synchronized approach even though the degree of parallelization is the same (only one reader or writer in the critical section concurrently). Interestingly, we note that while the ReadWriteLock does offer better parallelization (via more than one reader in the critical section at a time), the overhead of the internal locking algorithms is enough to degrade performance to where speed wins out over parallelization.

Scenario 2: Read-mostly, Slow Reads, Infrequent Writes

The second locking scenario is characterized by:

  • Reading threads outnumber writing threads 4:1
  • Read threads each perform 100 reads from the cache
  • Read lock duration long (reading shared data takes one millisecond)
  • Write lock duration medium (writing shared data occurs every second and takes 10 milliseconds)

Here we see that simply switching to a Lock-based locking scheme does not improve performance over the old-style synchronized approach. However, just look at how much more parallelization we achieve by taking advantage of the ReadWriteLock mechanism — performance increases by nearly 40x! We achieve better parallelization by allowing more than one reader in the critical section at a time. Since the reads are comparatively slow, the overhead of the internal locking algorithms does not adversely affect performance. On the other hand, the other two implementations begin blocking most severely.

Scenario 3: Read-mostly, Fast Reads, Frequent Writes

The third locking scenario is characterized by:

  • Reading threads outnumber writing threads 4:1
  • Read threads each perform 100,000,000 reads from the cache
  • Read lock duration short (reading shared data takes near-zero time)
  • Write lock duration medium (writing shared data occurs every 10 milliseconds and takes 10 milliseconds)

Again, simply switching to a Lock-based locking scheme improves performance by over 4x over the old-style synchronized approach with the same degree of parallelization. Interestingly, we again notice the overhead of the ReadWriteLock internal locking algorithm’s affect on performance, though still over 2x that of the synchronized approach.

Scenario 4: Equal Read and Write Threads, Fast Reads, Infrequent Writes

The last locking scenario is characterized by:

  • Reading threads and writing threads 1:1
  • Read threads each perform 10,000,000 reads from the cache
  • Read lock duration short (reading shared data takes near-zero time)
  • Write lock duration medium (writing shared data occurs every second and takes 10 milliseconds)

Yet again, simply switching to a Lock-based locking scheme improves performance by over 4x over the old-style synchronized approach with the same degree of parallelization. Interestingly, we again notice the overhead of the ReadWriteLock internal locking algorithm’s affect on performance, though the increased parallelization yields a nearly 3x performance improvement over that of the synchronized approach.

Conclusion

So what can we conclude from this? That we should immediately switch to using the Lock and/or ReadWriteLock APIs? No.

The point is:

No one locking mechanism will consistently perform better than another. Review your usage patterns, the locking mechanism they use, and test performance. Use the right tool for the job.

As usual, there are several factors involved with a given usage pattern:

  • Number of concurrent reader threads attempting to access the shared data structure
  • Time needed to read a value from the shared data structure
  • Number of concurrent writer threads attempting to modify the shared data structure
  • Time needed to write a value to the shared data structure
  • Time between write accesses

Of course, this list isn’t exhaustive, but it gives us an idea what factors come into play.

Remember too that the boxes that run our applications are being imbued with the ability to do more in parallel via multiple CPUs and N-core CPUs. A noticed trend is for CPU designers to rely more on the number of cores rather than raw performance of each core itself. That is, the number of cores may increase faster than the speed of each core. It behooves developers to ensure they’re using the appropriate amount of parallelization.

So – take a look at the Concurrency APIs if you haven’t already. There are many more gems in it that make it worth your while.

May 1, 2007

The Dangers of Escapism

Posted in Design, Java at 8:31 PM by Kirk True

I’m the kind of guy that has a ‘word of the day’ calendar and will use odd words during a conversation in an attempt to use the word sometime during the day. I especially enjoy when I learn a new word (or a new meaning of a word) that succinctly describes a concept or thought that I’ve been grasping to articulate. One such word that I recently came across describes a software design problem: “escapism.” Now I haven’t found a rigorous text book definition of the word, but I’ll do my best to define it:

Escapism: In software design, the inadvertent release of an object reference outside its current scope.

The usage of the qualifier “inadvertent” here is important because the programmer would probably find such behavior to be undesirable, though not always obvious.

One common source of escaped objects is the release of internal object references to external code. Programmers will work long and hard to encapsulate any internal objects a class uses in an effort to provide confidence in the object’s state, usage, etc. Anything that undermines that confidence should be addressed.

So, let’s look at a detailed example at where this form of escaping objects can cause problems.

Imagine a programmer creates a class—UserCache—to cache User objects. Part of the purpose of the class is that it will notify listeners when a User is added to or removed from the cache. To provide more flexibility, the programmer chooses to utilize the Decorator pattern to enhance an existing data structure with his cache logic rather than creating it from scratch.

Let’s see what the class looks like:

    public class UserCache extends AbstractSet<User> {

        private Set<User> rawSet;

        public UserCache(Set<User> rawSet) {
            this.rawSet = rawSet;
        }

        @Override
        public Iterator<User> iterator() {
            return rawSet.iterator();
        }

        @Override
        public boolean add(User user) {
            boolean wasAdded = rawSet.add(user);
            notifyListeners(user);
            return wasAdded;
        }

        @Override
        public boolean remove(Object o) {
            boolean wasRemoved = rawSet.remove(o);
            notifyListeners((User)o);
            return wasRemoved;
        }

    }

This is pretty standard use of the Decorator pattern. The programmer has decorated a standard java.util.Set named rawSet with some listener notification logic.

In general, whenever a class decorates another object it’s important (usually imperative) that the decoratee not be used for other purposes. For instance, adding or removing objects to the decoratee directly would be problematic:

    Set<User> internalSet = new HashSet<User>(64);
    Set<User> users       = new UserCache(internalSet);
    . . .
    internalSet.add(someUser);
    internalSet.add(anotherUser);

It is clearly inappropriate to use internalSet in this manner after it has been decorated as we would not call our listeners about these additions. As the creator of the UserCache class, we might not have any idea what sort of code depends on the listener notifications to work correctly. The result of missed notifications could result in any number of really bad problems. So it’s important we make the UserCache class bulletproof in terms of delivering event notifications.

Worried by this possibility the programmer rewrites the UserCache to use an internal java.util.Set rather than decorating a user-provided Set:

    public class UserCache extends AbstractSet<User> {

        private Set<User> rawSet;

        public UserCache() {
            rawSet = new HashSet<User>();
        }

        . . .

    }

So while it isn’t as flexible, at least now there’s no way that a user can get access to the internal java.util.Set and add or remove objects directly, right? As we look over the class we don’t see anywhere that rawSet is returned or passed as a parameter or anything. It would appear that the programmer has solved the problem. While there isn’t anywhere that we return rawSet directly, is there a way that we could be returning another object that will allow access to rawSet?

Take a closer look at the iterator method:

    @Override
    public Iterator<User> iterator() {
        return rawSet.iterator();
    }

Notice that it returns a java.util.Iterator based on rawSet. ‘But,’ you might say, ‘I want to be able to iterate over all the users.’ That’s fine. But recall that the java.util.Iterator interface includes a method named remove that removes the object last returned from next. What happens when we call remove is that the object is removed from rawSet directly – not from UserCache. In this case, UserCache would not receive any notification of the removal and thus it has no possible way to notify the listeners. While we would agree that most callers of iterator do simply want to iterate over the users, there’s no way to guarantee that some caller, some time, may try to remove a user directly via that interface.

In this example we would say that rawSet has escaped via the iterator method. This is a slight variation of escapism because strictly speaking we aren’t returning a reference to rawSet to external code, but we are returning a view of rawSet through which we can modify it. Unfortunately this is a fairly simple example. Take a look at the java.util.List and java.util.Map interfaces and see how many places a view of the internal object is returned. And note too the java.util.ListIterator that extends Iterator and includes an add and a set method :(

At this point it should be emphasized that it isn’t just java.util.Collection-related classes that need to be worried about escapism. The following code snippet shows how an object reference can escape through the constructor:

    public class FileRepository implements FileEventListener {

        private File directory;

        public FileRepository(File directory) {
            FileEventListenerManager.register(this);
            this.directory = directory;
        }

        public void fileAccessed(FileEvent fileEvent) {
            File updatedFile = fileEvent.getFile();

            if (directory.equals(updatedFile.getParentFile()))
                touch(updatedFile);
        }

        . . .

    }

In this second example, we have a class—FileRepository—that listens for accesses to files and touches them if they’re within a certain directory. Notice that constructor lets the this reference escape through the call to FileEventListenerManager.register(). While this may seem innocuous (and arguably intentional), note that the internal directory object has not been initialized. In a multithreaded environment it is perfectly possible that immediately after the listener registration occurs that a file is accessed, thus calling the FileRepository‘s fileAccessed method. If this occurs, a java.lang.NullPointerException will be thrown when the equals method is invoked on a then-null object.

As you can begin to imagine, there are many scenarios in which escapism can cause problems. So, how do we solve the problem of escapism?

In our UserCache example, we have to implement iterator as it’s part of the java.util.Set interface. A first shot might be to provide an implementation of java.util.Iterator that calls back to rawSet on calls to Iterator.remove. While this may seem like a workable solution on the surface, when Iterator.remove is called and then rawSet.remove is called, the caller will receive a java.util.ConcurrentModificationException because he’s both iterating and modifying rawSet “concurrently.”

Another possibility would be to return a java.util.Iterator from a read-only view of rawSet via:

    @Override
    public Iterator<User> iterator() {
        return Collections.unmodifiableSet(rawSet).iterator();
    }

The caller can still call Iterator.remove, but he will receive an java.lang.UnsupportedOperationException. While it’s arguable that teasing the caller with a method that will only fail in this way is bad design, at least it’s consistently so with other parts of the JDK.

In the example of the FileRepository it may seem like the easist thing to do is switch the code in the constructor to read:

    public class FileRepository implements FileEventListener {

        private File directory;

        public FileRepository(File directory) {
            this.directory = directory;
            FileEventListenerManager.register(this);
        }

        public void fileAccessed(FileEvent fileEvent) {
            File updatedFile = fileEvent.getFile();

            if (directory.equals(updatedFile.getParentFile()))
                touch(updatedFile);
        }

        . . .

    }

Initializing the directory object first will make a big difference. However, in other cases it might not be so cut and dry, especially in concurrent applications. The book Java Concurrency in Practice succinctly and bluntly advises:

Do not allow the this reference to escape during construction.

Some suggestions for this case are to create a static factory method such that the object can be fully constructed (and thus ensuring that its object reference is valid) before being handed off to any external code:

    public class FileRepository implements FileEventListener {

        private File directory;

        private FileRepository(File directory) {
            this.directory = directory;
        }

        public void fileAccessed(FileEvent fileEvent) {
            File updatedFile = fileEvent.getFile();

            if (directory.equals(updatedFile.getParentFile()))
                touch(updatedFile);
        }

        public static FileRepository create(File directory) {
            FileRepository fr = new FileRepository(directory);
            FileEventListenerManager.register(fr);
            return fr;
        }

        . . .

    }

Finding possible escape paths for objects takes serious attention to detail, knowing a little about the internals of the objects in use, and, of course, testing. But finding the problem is only half the battle, patching the escape paths is equally challenging. Unfortunately there is no across-the-board way to fix the problems, it takes hunkering down and thinking through the seeming myriad of possibilities.

Happy hunting! ;)

Previous page

Follow

Get every new post delivered to your Inbox.