EHC ❯ nonstop store initialization race condition
-
Bug
-
Status: Resolved
-
1 Critical
-
Resolution: Fixed
-
ehcache-terracotta
-
-
serverteam
-
Reporter: gadams00
-
April 26, 2013
-
0
-
Watchers: 6
-
September 27, 2013
-
April 26, 2013
Attachments
Description
When using a nonstop configuration on a distributed cache with bigmemory max 4.0.0, an initialization race condition exists. If you try to put to a nonstop cache too quickly after initializing the CacheManager, the put fails to affect L1 or L2, and an exception is silently logged, but not thrown to client code:
Exception in thread "init Store asynchronously testCache" java.lang.RuntimeException: com.tc.exception.TCNotRunningException: Terracotta is not running.
at org.terracotta.modules.ehcache.store.nonstop.NonStopStoreWrapper.createStore(NonStopStoreWrapper.java:183)
at org.terracotta.modules.ehcache.store.nonstop.NonStopStoreWrapper.doInit(NonStopStoreWrapper.java:188)
at org.terracotta.modules.ehcache.store.nonstop.NonStopStoreWrapper.access$100(NonStopStoreWrapper.java:51)
at org.terracotta.modules.ehcache.store.nonstop.NonStopStoreWrapper$1.run(NonStopStoreWrapper.java:121)
at java.lang.Thread.run(Thread.java:662)
Caused by: com.tc.exception.TCNotRunningException: Terracotta is not running.
at com.tc.object.locks.ClientLockManagerImpl.checkState(ClientLockManagerImpl.java:724)
at com.tc.object.locks.ClientLockManagerImpl.unlock(ClientLockManagerImpl.java:223)
at com.tc.object.bytecode.ManagerImpl.unlock(ManagerImpl.java:804)
at com.tc.platform.PlatformServiceImpl.commitLock(PlatformServiceImpl.java:105)
at com.terracotta.toolkit.concurrent.locks.ToolkitLockingApi.doCommitLock(ToolkitLockingApi.java:175)
at com.terracotta.toolkit.concurrent.locks.ToolkitLockingApi.unlock(ToolkitLockingApi.java:162)
at com.terracotta.toolkit.roots.impl.AggregateIsolatedToolkitTypeRoot.unlock(AggregateIsolatedToolkitTypeRoot.java:118)
at com.terracotta.toolkit.roots.impl.AggregateIsolatedToolkitTypeRoot.getOrCreateToolkitType(AggregateIsolatedToolkitTypeRoot.java:76)
at com.terracotta.toolkit.roots.impl.AggregateIsolatedToolkitTypeRoot.getOrCreateToolkitType(AggregateIsolatedToolkitTypeRoot.java:31)
at com.terracotta.toolkit.factory.impl.AbstractPrimaryToolkitObjectFactory.getOrCreate(AbstractPrimaryToolkitObjectFactory.java:30)
at com.terracotta.toolkit.factory.impl.AbstractPrimaryToolkitObjectFactory.getOrCreate(AbstractPrimaryToolkitObjectFactory.java:17)
at com.terracotta.toolkit.TerracottaToolkit.getSet(TerracottaToolkit.java:260)
at com.terracotta.toolkit.NonStopToolkitImpl$9.lookupObject(NonStopToolkitImpl.java:226)
at com.terracotta.toolkit.NonStopToolkitImpl$9.lookupObject(NonStopToolkitImpl.java:223)
at com.terracotta.toolkit.nonstop.AbstractToolkitObjectLookup.getInitializedObject(AbstractToolkitObjectLookup.java:30)
at com.terracotta.toolkit.nonstop.NonStopInvocationHandler.invoke(NonStopInvocationHandler.java:35)
at $Proxy17.getReadWriteLock(Unknown Source)
at org.terracotta.modules.ehcache.store.bulkload.BulkLoadEnabledNodesSet.<init>(BulkLoadEnabledNodesSet.java:42)
at org.terracotta.modules.ehcache.store.bulkload.BulkLoadToolkitCache.<init>(BulkLoadToolkitCache.java:49)
at org.terracotta.modules.ehcache.store.ClusteredStoreBackend.<init>(ClusteredStoreBackend.java:37)
at org.terracotta.modules.ehcache.store.ClusteredStore.<init>(ClusteredStore.java:141)
at org.terracotta.modules.ehcache.store.EnterpriseClusteredStore.<init>(EnterpriseClusteredStore.java:105)
at org.terracotta.modules.ehcache.store.EnterpriseTerracottaClusteredInstanceFactory.newStore(EnterpriseTerracottaClusteredInstanceFactory.java:25)
at org.terracotta.modules.ehcache.store.TerracottaClusteredInstanceFactory.createStore(TerracottaClusteredInstanceFactory.java:92)
at net.sf.ehcache.terracotta.ClusteredInstanceFactoryWrapper.createStore(ClusteredInstanceFactoryWrapper.java:93)
at net.sf.ehcache.CacheManager.createTerracottaStore(CacheManager.java:623)
at net.sf.ehcache.Cache$1.call(Cache.java:1106)
at net.sf.ehcache.Cache$1.call(Cache.java:1103)
at org.terracotta.modules.ehcache.store.nonstop.NonStopStoreWrapper.createStore(NonStopStoreWrapper.java:176)
... 4 more
The exception logged varies between the “Terracotta is not running” message above and something about the store already being shutdown. I’ve even seen cases where no exception is logged at all, but the put still fails.
I’ve attached a reproduction test case as a maven project (bigmemory-init.zip). To reproduce, install BigMemory Max 4.0.0 and start the server. Extract the zip and run mvn test in the project root. If you comment out the nonstop configuration in src/main/resources/ehcache.xml, the failures no longer occur, which leads me to believe that this is an initialization race condition specific to nonstop caches.
Comments
Fiona OShea 2013-04-26
Manish Choudhary Choudhary 2013-04-26
Using a nonstop cache too quickly after initializing the CacheManager had problems earlier, which were fixed in DEV-8839. PUT was failing silently because the cache initialization was not complete and we started using the cache. In this case it is following the nonstop timeout behavior “NO_OP” as mentioned in the ehcache.xml. With NO_OP the cache operations are expected to fail silently.
The reason for TCNotRunningException is because shutDown was called on Toolkit. It is probably because the test failed at its assertion and resulted in Toolkit Shutdown. So the Thread which was initializing the nonstop Toolkit asynchronously failed with TCNotRunningException.
This should not happen with latest code.
Fiona OShea 2013-04-26
Greg, thank you for the feedback. This is resolved in our upcoming version which should be released in June
Please take a look at this as a priority.