A Distributed Reentrant Read-Write Lock Using a Hazelcast Data Grid

Most Java programmers are familiar with the java.util.concurrent package and some of the handy things in it like ReentrantReadWriteLock. To recap, a ReadWriteLock solves the “Readers-Writers Problem” in computer science.

A read-write lock allows for a greater level of concurrency in accessing shared data than that permitted by a mutual exclusion lock. It exploits the fact that while only a single thread at a time (a writer thread) can modify the shared data, in many cases any number of threads can concurrently read the data (hence reader threads). In theory, the increase in concurrency permitted by the use of a read-write lock will lead to performance improvements over the use of a mutual exclusion lock. – JavaDocs, JSE 7

Additionally, a ReentrantReadWriteLock, allows any thread to acquire the same lock more than once.

This lock allows both readers and writers to reacquire read or write locks in the style of a ReentrantLock. Non-reentrant readers are not allowed until all write locks held by the writing thread have been released.

Additionally, a writer can acquire the read lock, but not vice-versa. Among other applications, reentrancy can be useful when write locks are held during calls or callbacks to methods that perform reads under read locks. If a reader tries to acquire the write lock it will never succeed.- JavaDocs, JSE 7

It would be nice to have the same semantics available to control concurrent access to resources in a distributed application, but the only implementations I’m aware of are heavy-weight (.e.g. Apache Zookeeper Shared Reentrant Read Write Lock Recipe.)

Hazelcast’s distributed in-memory data grid, on the other hand, is lightweight and easy to use. Drop a jar file into your application and off you go. Hazelcast includes distributed implementations of Maps, Lists, Queues, etc. as well as Semaphores, Locks and AtomicLongs. It seems we should be able to implement the synchronization we want using some of these distributed collections and concurrency primitives.

This blog post provides a good starting point for how to implement a ReadWrite lock using two semaphores. Though it explains the concept simply, it has two deficiencies. First, writers may starve while readers hog the lock. Second, it isn’t re-entrant. The first problem can be solved, as the blog author notes, using a third semaphore. This article (PDF) illustrates how to solve the “writers-preference” problem using a third semaphore. To make it work, though, we’ll have to replace those counters with distributed AtomicLongs.

That’s great for a non-reentrant ReadWrite lock, but what about a reentrant one? An algorithm to prevent deadlock and allow strongly reentrant usage (writers can acquire nested read locks) is explained in “Reentrant Readers-Writers” (PDF). It requires the use of a monitor–which in Java means a Lock and associated Condition–and involves keeping a per-thread count of nested locks. For the latter, I stash the count in a ThreadLocal. There are a couple of other niggly bits involving a distributed counter and a distributed boolean (which we’ll fake with an AtomicLong.)

The end result is two types of distributed locks, implemented using Hazelcast’s ISemaphore, ILock, ICondition and IAtomicLong. The only further complication is the desire to abstract the grid implementation being used (mostly useful for testing, since I know of no other grid technology that provides the data structures required here.) I use a DistributedDataStructureFactory and a DistributedLockFactory to solve those problems, as well as some helper interfaces and wrapper classes to compensate for the fact that in java.util.concurrent, Semaphore and AtomicLong are concrete classes.

Assuming you have a HazelcastInstance, usage is identical to usage of java.util.concurrent.locks.ReentrantReadWriteLock, with the exception of creation of a new lock instance.

// This can be a singleton, but additional instances aren't a problem.
DistributedLockFactory lockFactory =
    new DistributedLockFactory(new HazelcastDataStructureFactory(hazelcastInstance));
ReadWriteLock lock = lockFactory.getReentrantReadWriteLock("myLock");
try {
    // do some stuff
finally {

The full package, with both types of locks, helper classes, unit and integration tests has been released under the Apache 2.0 license by kind permission of my employer ThoughtWire Corporation. You can find it on GitHub here: https://github.com/ThoughtWire/hazelcast-locks

Update: June 13, 2014.
Shortly after releasing the first version of this package, I discovered an additional wrinkle, namely that each lock operation must deal very carefully with thread interruption so as not to leave any data structures in a state which could lead to deadlock of other threads or nodes. Specifically, all operations that call blocking methods (such as Semaphore.acquire() or Condition.await()) must catch any thrown InterruptedException and restore the lock’s original state before setting the thread’s interrupted status and returning. In practice, this is quite messy to do (!) and a worthy improvement would be to find a way to tidy it up. For the gory details on proper handling of task cancellation, see Goetz et al, “Java Concurrency in Practice, 2nd edition” (specifically, Chapter 7.)

My Erdos Number is 4

Related to my previous post, I now have an Erdős number of 4. Another thing I’ve always wanted! Here are the details and an explanation of Erdős numbers for those who aren’t familiar with them.

I’ve posted previously about the mathematician Paul Erdős. Among other things, Erdős was insanely prolific and published 1,475 papers with 511 collaborators. Since one of his many areas of interest was graphs, it’s not surprising that a collaboration graph of his co-authors, and their co-authors, and so on…should be of interest. Courtesy of Wikipedia:

The Erdős number…describes the “collaborative distance” between a person and mathematician Paul Erdős, as measured by authorship of mathematical papers. It was created by friends as a humorous tribute to the enormous output of Erdős, one of the most prolific modern writers of mathematical papers, and has become well-known in scientific circles as a tongue-in-cheek measurement of mathematical prominence.

The Erdős collaboration graph is too huge to visualize, sadly, but the Erdős Number Project site has some interesting facts about the graph. Unfortunately, I think this information is skewed because it is based only on papers published in mathematical journals, while the high degree of interdisciplinary collaboration means that many people outside of mathematics have finite Erdős numbers. Anyway, according to this information, about 83,642 other people have Erdős number 4 (probably a gross underestimate.)

My relationship to Erdős comes from the fact that one of my co-authors, Michael Brudno, was a collaborator with at least two authors with Erdős number 2: Serafim Batzoglou and Lior Pachter. Each of those authors is a co-author with Daniel J. Kleitman, who not only has Erdős number 1, but has the lowest known Erdős-Bacon number: 3.

It’s conceivable that through one of Mike Brudno’s other collaborators, his number could in fact be 2, making mine 3, but confirming or disconfirming that would be too laborious. I’m more than satisfied with 4, which is slightly lower than the mean–especially considering that I never dreamed I’d have an Erdős number at all!

Savant: Genome Browser for High-Throughput Sequencing Data

I forgot to mention I now have a peer-reviewed publication! I don’t have a “bucket list” as such, but this is something I’ve always wanted. Not being an academic has made that unlikely, but because of my job in a research lab, it’s finally happened. Yay! Here are the details:

Savant: genome browser for high-throughput sequencing data
Marc Fiume, Vanessa Williams, Andrew Brook, Michael Brudno
Bioinformatics 2010; 26:1938-1944, August 15, 2010
doi: 10.1093/bioinformatics/btq332.

Read the abstract or the full paper (PDF). It has pretty pictures ;)

Cryptanalysis and genomics

For some reason it occurred to me that these two things should go together (while reading about Schroedinger’s brilliant notion about “the stuff of the gene” being some kind of aperiodic crystal). Anyway, while searching for stuff on this topic, I came across this great bit: Craig Venter’s synthetic bacterium contains coded “watermarks” in its DNA. One of these watermarks actually contains a Webpage, complete with a link. Others include quotations by James Joyce and Richard Feynman.

It sounds like science fiction, doesn’t it? Seriously cool–and slightly creepy. Imagine this kind of thing being introduced into humans via gene therapy!

I also found this paper on using cryptanalytic techniques to predict introns and exons. Sadly, that was all I could find. Perhaps it is not a fruitful avenue of research. Or perhaps it is just new and/or obscure. Time will tell.

First days with an iPad

I’m one of those seemingly few people who thought they had a real purpose for an iPad. I have a laptop and various old computers at home, but hate hauling my laptop around the apartment just to check email or look up something on wikipedia. So I splurged and bought one, which arrived Friday.

While it has it’s flaws, it’s quickly becoming indispensable!

A word to the wise, though: stay away from the app store. I’ve already spent an insane amount on stuff I really do not need. Those micro charges add up fast!

Platonic Solids: Generative Architecture by Subdivision

We’ve all heard of generative graphics, how about generative architecture? The computational experiment Platonic Solids by Michael Hansmeyer is something you can get lost in for a good while. Especially since he’s added a 3-D anaglyph presentation. I hope you kept a pair of red-cyan 3D glasses in your junk drawer like I did!

Here’s what the project’s all about, in the artist’s words.

Gruple moved to Codehaus

As promised, Gruple has moved from Googlecode to Codehaus. You find it’s new home here: http://gruple.codehaus.org.

All the documentation has been ported. There are new mailing lists. The source is now in a Git repository. And the 1.1.1 distribution is available from the distro site. Everything is (or should be) linked to from the main page.

Thanks to everyone who commented or helped.

Gruple 1.1.1 with Transactions released

I had almost given up on Gruple because I had no idea if anyone was using it. But it turns out I need it so badly myself that I got a second-wind and implemented transactions. I released v.1.1 yesterday and then realized with horror that it had serious bugs. After a frantic morning, I’ve got Gruple v.1.1.1 out and I believe (pray) those bugs are addressed.

That is not to say that I’m completely confident there are no bugs in Gruple as it stands! I have noticed occasional bad behaviour, the sure sign of a concurrency time-bomb somewhere. I’ll keep doing my best to track it down, starting with adding concurrent unit tests with the help of GroboUtils. If you use Gruple and have any problems please do let me know.

Finally, Gruple will be moving to Codehaus fairly soon (it’s approved, but there is work to do.) This will give it greater exposure. I’ll be switching to a git repo, because that just seems to be the thing to do (and I hate SVN with a passion anyway.)

Now if only I could get someone at Terracotta and SpringSource to work on supporting Groovy in the Terracotta product, I’d be laughing. And so would you.

New Direction: Computational Biology

After an absurdly long job search, I’ve finally found myself a comfortable place in a computational biology lab. I’ve been here a bit more than a month and thought I should mention something about what I’m doing.

I’m working for Dr. Michael Brudno in the Computational Biology Lab at the University of Toronto. At the moment, I’m developing an application for visualization and analysis of biological sequence and annotation data with a graduate student named Marc Fiume. (We just chose a name for our project today: SAVANT. I like it.) I’m also sitting in on a graduate seminar on analysis of high throughput sequencing data and attending the occasional presentation on related research at The Centre for Applied Genomics. I’ll be spending one day a week at Sick Kids hospital, in order to interact with biologists and bioinformaticians who are among the target users of SAVANT.

I’m having a great time.

This is all a huge change from the enterprise web development that is more or less what I’ve been doing since 1996. A huge change that I really needed. Sometimes you just need to start over, you know? It was getting to the point where I honestly couldn’t picture myself actually taking any of the jobs I was applying for. I couldn’t face the same-old, same-old any longer.

I’m not sure where this is all going to lead, but I’m kind of hoping to make a career in this relatively young field. I believe that my many years of experience in commercial software engineering will be useful here. I think I can have fun and make a difference. The territory is huge; the problem space practically inexhaustible. I can’t imagine getting bored any time soon. Heading off in a new direction feels exactly right. So work-wise right now, it’s all good. :)

Posts and pointers on software, art, math, noise, and other obsessions…