putWithWriter throws NullPointerException after cache reconnects with terracotta server

Type: Bug
Status: Closed
Priority: 1 Critical
Resolution: Cannot Reproduce
Component/s: ehcache-core,ehcache-terracotta
Labels:

Assignee: drb
Reporter: cspros
Created: May 23, 2011
Votes: 0
Watchers: 3
Updated:
July 27, 2012
Resolved:
August 04, 2011

Description

If a cache with a cacheWriter has been configured to communicate with terracotta and has lost/restored connection, calling the putWithWriter method will throw a NullPointerException. It appears that once the cache has rejoined the cluster, the cacheWriterManager no longer has a reference to the cache. This results in the call to putWithWriter(..) throwing a NullPointerException:

Exception in thread “main” net.sf.ehcache.CacheException: net.sf.ehcache.writer.CacheWriterManagerException: java.lang.NullPointerException at net.sf.ehcache.constructs.nonstop.NonstopExecutorServiceImpl.execute(NonstopExecutorServiceImpl.java:87) at net.sf.ehcache.constructs.nonstop.store.ExecutorServiceStore.executeWithExecutor(ExecutorServiceStore.java:157) at net.sf.ehcache.constructs.nonstop.store.ExecutorServiceStore.executeWithExecutor(ExecutorServiceStore.java:126) at net.sf.ehcache.constructs.nonstop.store.ExecutorServiceStore.putWithWriter(ExecutorServiceStore.java:326) at net.sf.ehcache.constructs.nonstop.store.NonstopStoreImpl.putWithWriter(NonstopStoreImpl.java:377) at net.sf.ehcache.Cache.putInternal(Cache.java:1408) at net.sf.ehcache.Cache.putWithWriter(Cache.java:1366) at com.prosrm.webtest.TerracottaExample.(TerracottaExample.java:18) at com.prosrm.webtest.TerracottaExample.main(TerracottaExample.java:26) Caused by: net.sf.ehcache.writer.CacheWriterManagerException: java.lang.NullPointerException at net.sf.ehcache.writer.writethrough.WriteThroughManager.put(WriteThroughManager.java:52) at org.terracotta.modules.ehcache.store.ClusteredStore.putInternal(ClusteredStore.java:291) at org.terracotta.modules.ehcache.store.ClusteredStore.putWithWriter(ClusteredStore.java:267) at net.sf.ehcache.constructs.nonstop.store.ExecutorServiceStore$7.call(ExecutorServiceStore.java:328) at net.sf.ehcache.constructs.nonstop.store.ExecutorServiceStore$7.call(ExecutorServiceStore.java:326) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at net.sf.ehcache.constructs.nonstop.NonstopThreadPool$Worker.run(NonstopThreadPool.java:151) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NullPointerException at net.sf.ehcache.writer.writethrough.WriteThroughManager.put(WriteThroughManager.java:47) ... 8 more

Comments

James House 2011-07-27

Is there a test case for reproducing this?

Can you be more specific about the scenario where this occurs? “has lost/restored connection” is a bit vague - are we talking about a cluster rejoin, or was the client application down for a while and then restarted, or?

James House 2011-07-29

I have been unable to reproduce this, and in fact think that I have a test case that proves this works (at least with trunk).

I have made the following assumptions: non-stop cache, time-out long enough to allow for rejoin while put is blocked, write-through mode writer on the cache.

I have created a test that instantiates the cache, does a small number (100) puts, kills the TC server, then calls put() on the cache. I have verified that the put does block until rejoin occurs, and that the put unblocks and succeeds after the rejoin.

On the assumption that it may be a race condition, I have scripted the run of the test to occur over and over, and have not been able to achieve a failure - even with running additional heavy processes, etc. trying to affect thread timing.

Any assistance with reconstructing you failure scenario would be much appreciated.

James House 2011-08-04

I am still unable to reproduce this issue after having spent many hours trying to do so (on both 2.4.2 and trunk).

If you can give more details and/or submit a test case that reproduces it we can look further into it.