Fast Fail Iterators What is a mysterious field called modCount doing in the List Iterator classes? How can we efficiently handle iterating over collections when another thread may be updating the collection at the same time? AbstractList The AbstractList class in the java.util package is normally only examined by developers intending to implement a List class. AbstractList is an abstract class, which means it is a class, which provides a partial implementation of some type of behavior, and is designed to be extended by subclasses, which will provide the complete implementation. In particular, AbstractList provides a skeletal implementation of the java.util.List interface, and is designed for subclasses which will be backed by a random access data structure such as an array (e.g. Object[]). ArrayList is one such subclass of AbstractList. Iterators One service provided by AbstractList is the Iterator, or rather the Iterators, which are required by the java.util.List interface. The java.util.List interface specifies three methods which return Iterators:
Iterator iterator(); ListIterator listIterator(); ListIterator listIterator(int index); AbstractList provides implementations for all three Iterator methods, and also provides class implementations for the Iterator objects. One aspect of these Iterator classes is of particular interest to us, a mysterious field called modCount. The following behavior is described in the AbstractList comments:
[Talking about modCount] * If the value of this field changes unexpectedly, the iterator (or list * iterator) will throw a ConcurrentModificationException in * response to the next, remove, previous, * set or add operations. This provides * fail-fast behavior, rather than non-deterministic behavior in * the face of concurrent modification during iteration.
How does this work, and why is it useful? Concurrency Problems The issue under consideration is iterating through a collection while that collection may be being changed by another thread. This is a concurrency issue. In particular, what happens if thread 1 is iterating through a collection, and thread 2 changes the size of the collection at the same time, by adding or deleting an element. This is a classic concurrency problem, which can result in one of four scenarios. Corruption The first scenario is to just do nothing about this concurrency problem. To frame this in terms of a database analogy, it would be the equivalent of performing updates to your database outside of the bounds of a transaction. If you are lucky, the iteration in thread 1 will occur before or after the collection update in thread 2, and nothing bad happens. However you obviously have a race condition, and it is likely that at some point the iteration will occur concurrently with the update. Then the outcome is undefined, usually leading to either a data corruption or some undefined error. For example, if one were to perform a concurrent delete, this may result in the Iterator trying to access the element beyond the end of the now shorter collection, causing a IndexOutOfBoundsException. The other case to consider is a concurrent addition, which may result in one element of the collection being missed out over the iteration. It also equally likely that other problems could occur, such as a NullPointerException due to the collection being resized requiring a new underlying array to be created and copied into. Normally, this "do nothing about it" scenario is not an acceptable solution. Returning to our analogy, current database technology imposes the constraint that one cannot update the database outside of the bounds of a transaction. Instead, it requires us to lock the resources while the update is being performed. Locking The Collection Just as databases lock tables and rows while performing updates, we could lock the collection while we were iterating over it to solve the concurrent access/update problem. In this scenario, we prevent any thread from changing the collection while iteration takes place, thus guaranteeing predictable behavior. So, we might use a Vector object for our collection
since that class has its accessors and updators synchronized on the Vector object itself. Then your iteration can lock the Vector simply:
public ... search (Vector list) { //Synchronizing on the Vector itself ensures //that no other threads can use the Vector //accessors or updators while we are in the //code block synchronized(list) { ListIterator iterator = list.listIterator(); while(iterator.hasNext()) { Object o = iterator.next(); //do something ... } } } By taking a pessimistic approach, we have guaranteed that we will avoid any corruption of the collection and that the iteration is consistent. However, there is a cost to this scenario. First we have to use a synchronized collection, which has a slightly higher overhead than a nonsynchronized collection. And secondly, and much more importantly for performance, the threads are serialized for access and update to the collection, including for the duration of this and any other synchronized iterations. Each thread trying to update, access, or iterate over the collection must wait for any other thread doing so to finish before it can start. In the worst-case scenario, this would convert a multi-threaded program into an effectively single-threaded program, independently of the number of CPUs available. Copy On Write A third scenario is to use a List class like the CopyOnWriteArrayList class, due to be available in the java.util.concurrent package of the 1.5 distributions. In this case, the Iterator carries on iterating over the previous version of the list, while the updater gets to use a new version of the list. In database parlance, these are optimistic transactions; each session gets to carry on it's transaction as if the other didn't run, though
the Iterator is now using out-of-date information (in a transaction one session would fail to commit its transaction). This solution can carry a heavy performance overhead from the copying, especially if there are many concurrency conflicts. Fast Fail The fourth scenario to handle the concurrency issue is to (optimistically) assume that multiple threads with access to the collection will mostly not encounter concurrent update problems. In this case, locking the collection is an unnecessary overhead. Instead, what you can do is just catch those few occasions when concurrent access to the collection is an issue, and handle those cases specially. This is the tactic that AbstractList takes with its Iterators. AbstractList includes an instance variable called modCount (short for "modification count"). Every time any change is made to the AbstractList object, the modCount is incremented. (Subclasses of AbstractList are expected to adhere to this contract, though it cannot be enforced.) The AbstractList Iterators retain the value of modCount that existed when they are created, and check at every access and update whether that modCount variable has been changed from that creation time value. If the modCount variable is changed, then the Iterator throws a java.util.ConcurrentModificationException. The iteration code looks like this:
public ... search (List list) { try { ListIterator iterator = list.listIterator(); while(iterator.hasNext()) { Object o = iterator.next(); //do something ... } } catch (ConcurrentModificationException concEx) { //handle this case, maybe just iterate the collection again //e.g. return search(list); }
} This scenario has a very small overhead in the additional cost of updating the modCount variable on any change to the collection, and comparing the modCount variable for each iteration. This should be negligible in most cases. Optimistic Vs. Pessimistic The two alternative scenarios of locking the collection or throwing an exception on detecting concurrent access/update are special cases of the more generic pessimistic transaction vs. optimistic transactions scenarios. Pessimistic transactions lock the resources they need to ensure that they can complete their activity without errors, at the cost of making all other transactions wait for the resources to be freed. Optimistic transactions avoid locking resources, detecting if conflicts have occurred which require rolling back and re-applying the transaction. Optimistic transactional processing is usually more efficient where reads predominate over writes; pessimistic transactional processing is usually more efficient where writes predominate over reads. In the case of Iterators, if you are unlikely to have an update occur during the iteration, using the "fast fail" scenario is more efficient. But if updates often occur during iteration of the collection, then using "fast fail" would end up producing many ConcurrentModificationExceptions and also require the iteration to be re-run many times. In this situation locking the collection is usually more efficient. It is also possible to mix pessimistic and optimistic modes, by locking some resources but not others, and this is a way of fine-tuning an application. In the scenarios we described in this article, that mixed mode option is not available, but many concurrency conflicts are more complicated and can benefit from a mixed mode approach. So there you have it, an interesting technique in which we can use an optimistic approach to solve a concurrency issue. Although you have to be careful when using optimistic approaches to these types of problems, done right, they can radically improve the scalability characteristics of your application. In this instance, we have seen how some clever coding has resulted in one being able to allow safe concurrent access to Iterators without forcing that access to be mutually exclusive.