April 26, 2010
July 27, 2012
February 27, 2011
From Forum post: http://forums.terracotta.org/forums/posts/list/3532.page#19745
We have been evaluating a distributed Ehcache using a simple application that gets references to objects in the cache and modifies them. The set up was:
- 1 TC server, 1 Mirror on separate servers (not in persistent mode)
- 3 TC clients accessing the cache (standalone Java app)
- 1 TC client webapp to view cache contents (in Tomcat)
We have been testing this setup at a rate of 500 updates per second to the cache (strictly speaking they are not “cache updates”, since the object in the cache is modified but not replaced). These updates are made randomly to 70 objects in the cache (there is no cache eviction).
The GC was tuned to run 1/minute for young collections and every 5 minutes for full collections, which stabilized the number of live objects (the application is creating many new Timestamp objects, although this could be modified).
The main TC server crashed after running at this rate for one month, with the following exception, caught during the garbage collection:
2010-04-15 13:16:03,161 [DGC-Thread] INFO com.tc.objectserver.dgc.impl.MarkAndSweepGarbageCollector - DGC[ 41059 ] YoungGen DGC start 2010-04-15 13:16:03,285 [DGC-Thread] INFO com.tc.objectserver.dgc.impl.MarkAndSweepGarbageCollector - DGC[ 41059 ] pre-DGC managed id count: 66386 2010-04-15 13:16:03,313 [DGC-Thread] ERROR com.tc.server.TCServerMain - Thread:Thread[DGC-Thread,5,TC Thread Group] got an uncaught exception. calling Callba\ ckOnExitDefaultHandlers. com.tc.util.TCAssertionError: Assertion failed: Bit index out of range at com.tc.util.Assert.failure(Assert.java:60) at com.tc.util.Assert.eval(Assert.java:80) at com.tc.util.Assert.assertTrue(Assert.java:100) at com.tc.util.OidLongArray.bit(OidLongArray.java:49) at com.tc.util.OidLongArray.isSet(OidLongArray.java:135) at com.tc.util.OidBitsArrayMapImpl.contains(OidBitsArrayMapImpl.java:72) at com.tc.objectserver.impl.NoReferencesIDStoreImpl$OidBitsStore.hasNoReferences(NoReferencesIDStoreImpl.java:84) at com.tc.objectserver.impl.NoReferencesIDStoreImpl.hasNoReferences(NoReferencesIDStoreImpl.java:46) at com.tc.objectserver.impl.ObjectManagerImpl.getObjectReferencesFrom(ObjectManagerImpl.java:641) at com.tc.objectserver.dgc.impl.YoungGCHook.getObjectReferencesFrom(YoungGCHook.java:57) at com.tc.objectserver.dgc.impl.MarkAndSweepGCAlgorithm.collectRoot(MarkAndSweepGCAlgorithm.java:167) at com.tc.objectserver.dgc.impl.MarkAndSweepGCAlgorithm.collect(MarkAndSweepGCAlgorithm.java:150) at com.tc.objectserver.dgc.impl.MarkAndSweepGCAlgorithm.doGC(MarkAndSweepGCAlgorithm.java:67) at com.tc.objectserver.dgc.impl.MarkAndSweepGarbageCollector.doGC(MarkAndSweepGarbageCollector.java:71) at com.tc.objectserver.dgc.impl.GarbageCollectorThread.doYoungGC(GarbageCollectorThread.java:94) at com.tc.objectserver.dgc.impl.GarbageCollectorThread.run(GarbageCollectorThread.java:75)
The mirror server took over correctly at this point, but crashed with the same message during the next distributed garbage collection.
Although some of the machines were being stretched by the testing, I did not detect any significant problems during the month of operation (using the metrics provided in the TC Dev. console. The TC temporary disk storage did vary quite a bit, but never seemed to grow out of control (max around 100MB). So this crash has left me rather puzzled so far…
Does this ring a bell for anybody? I realize this is not much to go by… Let me know if the full thread JVM dump in the TC log would help.
Thanks for any indications of where I should be looking.