Recalls while a concurrent write lock is held fail to recall - can cause the cluster to stall

Type: Bug
Status: Closed
Priority: 1 Critical
Resolution: Fixed
Component/s:
Labels:

Assignee: cdennis
Reporter: cdennis
Created: August 04, 2009
Votes: 0
Watchers: 0
Updated:
August 20, 2009
Resolved:
August 05, 2009

Description

Currently the code in ClientLock allows a held concurrent write lock to prevent a recall from occurring. In addition when the concurrent write lock is unlocked the lock is still not recalled to the server. This can cause the cluster to stall with nodes waiting indefinitiely for the recall of a lock that will never happen.

I have implemented a fix for this that stops concurrent write locks from preventing a greedy recall. The greedy lock is not required to establish the concurrent write lock hold and so should not be required for the hold to continue.

Entry timestamps in the distributed cache implementation are updated using concurrent write locks, and this is where the bug was first seen.

Comments

Chris Dennis 2009-08-05

This was caused by a bug in the ClientLock whereby a held concurrent write lock could both block a recall while it was being held, but also refused to trigger a recall when it was unlocked and the lock became free for recalling. With these changes concurrent write holds no longer block a recall from occurring.

Kalai Kannaiyan 2009-08-13

Verified the fix merged to 3.1 with svn rev13338 and system test added.