• Bug
  • Status: Open
  • 2 Major
  • Resolution:
  • cdennis
  • Reporter: nestrada
  • February 07, 2011
  • 0
  • Watchers: 4
  • October 11, 2011

Attachments

Description

The bug is outlined here.

http://forums.terracotta.org/forums/posts/list/4830.page

In short, when an L2 server goes down, an interrupt request on a toolkit provided BlockingQueue never throws an InterruptedException (for cleanup mostly).

I’ve traced the problem in the await() method of TerracottaCondition, in a finally block where reacquireLock(numOfHolds) is invoked. IMO, the latter should check if the current thread is interrupted as it does a couple of lines above before doing any further locking.

Comments

Nicolas Estrada 2011-02-07

I forgot to mention, I have attached a simple use case. To reproduce the bug, simply start the L2, run the main() of BlockingQueueHaltTest, stop the L2 and watch the paint dry ;)

Fiona OShea 2011-02-08

Is this expected behaviour?

Chris Dennis 2011-02-09

This is not a simple issue. I don’t think we can apply the fix Nicolas suggests in TerracottaCondition since the contract on Condition.await() is such that the locks held on entry must be reacquired on exit, even if this means waiting for the return of a disappeared server. We might however be able to code a specific fix for the blocking queue interrupt issue since we know in this case that the lock will simply be reacquired only for the InterruptedException to be thrown, and the lock to subsequently released in the queue code.

This isn’t something I expect to solve in a Fremantle timeframe however, so I’m going to push this out to Ulloa.