CacheManager.shutdown should wait for write-bebind queues to clear

Type: Bug
Status: Closed
Priority: 2 Major
Resolution: Fixed
Component/s: ehcache-core
Labels:

Assignee: gkeim
Reporter: gkeim
Created: December 21, 2010
Votes: 0
Watchers: 0
Updated:
July 27, 2012
Resolved:
December 05, 2011

Description

When you shutdown a CacheManager containing caches involved in write-behind, it should only return after the queues have been emptied (or an alternate strategy is devised).

In my EhcachePounder I have intercepted the window close event and try to cleanly shutdown the CacheManagers before terminating but I get this:

Exception in thread “ProcessingBucket 1 - processing” com.tc.exception.TCNotRunningException: Terracotta is not running. at com.tc.object.locks.ClientLockManagerImpl.waitUntilRunning(ClientLockManagerImpl.java:610) at com.tc.object.locks.ClientLockManagerImpl.lock(ClientLockManagerImpl.java:92) at com.tc.object.bytecode.ManagerImpl.lock(ManagerImpl.java:765) at com.tc.object.bytecode.ManagerUtil.monitorEnter(ManagerUtil.java:560) at org.terracotta.locking.TerracottaReadWriteLock$DsoLock.lock(TerracottaReadWriteLock.java:146) at org.terracotta.locking.TerracottaReadWriteLock$ReadLock.lock(TerracottaReadWriteLock.java:235) at org.terracotta.async.ProcessingBucket.isCancelled(ProcessingBucket.java:256) at org.terracotta.async.ProcessingBucket.access$200(ProcessingBucket.java:37) at org.terracotta.async.ProcessingBucket$ProcessingThread.run(ProcessingBucket.java:161) at java.lang.Thread.run(Thread.java:662)

Comments

Fiona OShea 2011-01-04

Do you have time to get this in Fremantle? By 1/15

Alexander Snaps 2011-01-05

Actually it shouldn’t wait for the queue to empty… or at least that’s what the JavaDoc hints to. I think though that it might be different for clustered (and persistent) Write-Behind Queues vs. standalone ones… The exception though is related to something else afaict, there is a race there between the Worker thread and the “stopping” one. isCancelled is trying to acquire a clustered read lock, while the L1 could have been shut down already.

Alexander Snaps 2011-01-11

isCancelled will let the worker thread leave the loop properly, i.e. w/o throwing the exception. listerners are now held in copyOnWriteArrayList, so that we don’t have to (cluster) lock firing the stop event.

WriteBehind is shut down as fast as possible still, as some other node will pick the work up. This behavior differs from standalone, where the shutdown waits for the worker thread to have acknowledged the shut down

Gary Keim 2011-10-20

I’m seeing a deadlock when shutting down a clustered cache participating in write-behind.

"NonStopCache [CM1] Executor Thread-2 for 'Thread-31'" Id=114 WAITING on java.util.concurrent.Semaphore$NonfairSync@3ef054e3
	at sun.misc.Unsafe.park(Native Method)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
	at java.util.concurrent.Semaphore.acquire(Semaphore.java:286)
	at com.tc.object.locks.LockStateNode$PendingLockHold.park(LockStateNode.java:184)
	at com.tc.object.locks.ClientLockImpl.acquireQueued(ClientLockImpl.java:782)
	at com.tc.object.locks.ClientLockImpl.acquireQueued(ClientLockImpl.java:758)
	at com.tc.object.locks.ClientLockImpl.lock(ClientLockImpl.java:62)
	at com.tc.object.locks.ClientLockManagerImpl.lock(ClientLockManagerImpl.java:100)
	at com.tc.object.locks.ClientLockManagerGroupImpl.lock(ClientLockManagerGroupImpl.java:56)
	at com.tc.object.bytecode.ManagerImpl.lock(ManagerImpl.java:808)
	at com.tc.object.bytecode.ManagerUtil.monitorEnter(ManagerUtil.java:560)
	at org.terracotta.locking.TerracottaReadWriteLock$DsoLock.lock(TerracottaReadWriteLock.java:154)
	at org.terracotta.locking.TerracottaReadWriteLock$WriteLock.lock(TerracottaReadWriteLock.java:326)
	at org.terracotta.async.ProcessingBucket.add(ProcessingBucket.java:310)
	at org.terracotta.async.AsyncCoordinator.add(AsyncCoordinator.java:272)
	at org.terracotta.modules.ehcache.writebehind.AsyncWriteBehind.write(AsyncWriteBehind.java:71)
	at net.sf.ehcache.writer.writebehind.WriteBehindManager.put(WriteBehindManager.java:60)
	at org.terracotta.modules.ehcache.store.ClusteredStore.putInternal(ClusteredStore.java:330)
	at org.terracotta.modules.ehcache.store.ClusteredStore.putWithWriter(ClusteredStore.java:306)
	at org.terracotta.modules.ehcache.store.ClusteredSafeStore.putWithWriter(ClusteredSafeStore.java:260)
	at net.sf.ehcache.constructs.nonstop.store.ExecutorServiceStore$11.call(ExecutorServiceStore.java:395)
	at net.sf.ehcache.constructs.nonstop.store.ExecutorServiceStore$11.call(ExecutorServiceStore.java:393)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:270)
	at java.util.concurrent.FutureTask.run(FutureTask.java:55)
	at net.sf.ehcache.constructs.nonstop.NonstopThreadPool$Worker.run(NonstopThreadPool.java:210)
	at java.lang.Thread.run(Thread.java:662)
LOCKED : [LongLockID(6222686021855612837), DsoLockID(ObjectID=[141402])]
WAITING TO LOCK: [DsoLockID(ObjectID=[1:291073])]


"AWT-EventQueue-0" Id=101 WAITING on java.util.concurrent.Semaphore$NonfairSync@50ce45a
	at sun.misc.Unsafe.park(Native Method)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
	at java.util.concurrent.Semaphore.acquire(Semaphore.java:286)
	at com.tc.object.locks.LockStateNode$PendingLockHold.park(LockStateNode.java:184)
	at com.tc.object.locks.ClientLockImpl.acquireQueued(ClientLockImpl.java:782)
	at com.tc.object.locks.ClientLockImpl.acquireQueued(ClientLockImpl.java:758)
	at com.tc.object.locks.ClientLockImpl.lock(ClientLockImpl.java:62)
	at com.tc.object.locks.ClientLockManagerImpl.lock(ClientLockManagerImpl.java:100)
	at com.tc.object.locks.ClientLockManagerGroupImpl.lock(ClientLockManagerGroupImpl.java:56)
	at com.tc.object.bytecode.ManagerImpl.lock(ManagerImpl.java:808)
	at com.tc.object.bytecode.ManagerUtil.monitorEnter(ManagerUtil.java:560)
	at org.terracotta.locking.TerracottaReadWriteLock$DsoLock.lock(TerracottaReadWriteLock.java:154)
	at org.terracotta.locking.TerracottaReadWriteLock$WriteLock.lock(TerracottaReadWriteLock.java:326)
	at org.terracotta.async.ProcessingBucket.stop(ProcessingBucket.java:273)
	at org.terracotta.async.ProcessingBucketGroup.stop(ProcessingBucketGroup.java:127)
	at org.terracotta.async.AsyncCoordinator.stop(AsyncCoordinator.java:288)
	at org.terracotta.modules.ehcache.writebehind.AsyncWriteBehind.stop(AsyncWriteBehind.java:123)
	at net.sf.ehcache.writer.writebehind.WriteBehindManager.dispose(WriteBehindManager.java:76)
	at net.sf.ehcache.Cache.dispose(Cache.java:2467)
	- locked <0x73a489dd> (a net.sf.ehcache.Cache)
	at net.sf.ehcache.CacheManager.shutdown(CacheManager.java:1260)
	- locked <0x3757477b> (a java.lang.Class)
	at demo.EhcachePounder.shutdown(EhcachePounder.java:310)
	at demo.EhcachePounder.access$23(EhcachePounder.java:300)
	at demo.EhcachePounder$2.windowClosing(EhcachePounder.java:142)

Alexander Snaps 2011-10-21

Gary, who owns the lock [DsoLockID(ObjectID=[1:291073])] ? I see two threads waiting to grab that lock if I read this correctly. The lock owned by the first thread is for the key in the cache itself. It then waits to lock the ProcessingBucket’s write lock. The second thread is also waiting to lock that Bucket to stop it. So, knowing what thread owned the lock for that bucket and what it was doing would be useful. All I can tell is you were using the pounder, I’ll have a quick look if I can reproduce this, but more info on how to achieve this, and what the setup was (how many L1s?), would be useful…

Alexander Snaps 2011-12-05

I can’t seem to be able to reproduce this… Am I missing something obvious or is this all fine now ?

Gary Keim 2011-12-05

I’m no longer able to reproduce this issue.