• Bug
  • Status: Resolved
  • 2 Major
  • Resolution: Fixed
  • ehcache-core
  • foshea
  • Reporter: mads1980
  • October 02, 2012
  • 0
  • Watchers: 9
  • January 28, 2013
  • October 09, 2012

Description

We’re trying out BigMemory Go in one of our production servers with the Fast Restartable store, and we’ve since been randomly getting the following error log:

[ERROR] 2012-09-30 00:00:05 [CacheStatisticsGatherer-cacheStatisticsGatherer] - Error calculating approx. in-memory byte size for cache com.renxo.cms.web.filter.EhCacheDistributedSessionManagementFilter.invalidatedSessionIds net.sf.ehcache.CacheException: Failure while decoding key java.nio.HeapByteBuffer[pos=35 lim=35 cap=35] at net.sf.ehcache.store.offheap.portability.EhcacheKeyPortability.decode(EhcacheKeyPortability.java:55) at net.sf.ehcache.store.offheap.portability.EhcacheKeyPortability.equals(EhcacheKeyPortability.java:60) at net.sf.ehcache.store.heap.RestartableSelectableConcurrentHashMap$RestartableSegment.getLsn(RestartableSelectableConcurrentHashMap.java:193) at net.sf.ehcache.store.heap.RestartableSelectableConcurrentHashMap$RestartableSegment.getLsn(RestartableSelectableConcurrentHashMap.java:60) at com.terracottatech.frs.object.AbstractObjectManagerStripe.getLsn(AbstractObjectManagerStripe.java:26) at com.terracottatech.frs.object.AbstractObjectManager.getLsn(Unknown Source) at com.terracottatech.frs.RemoveAction.(RemoveAction.java:41) at com.terracottatech.frs.RestartStoreImpl$AutoCommitTransaction.remove(RestartStoreImpl.java:168) at com.terracottatech.frs.RestartStoreImpl$AutoCommitTransaction.remove(RestartStoreImpl.java:117) at net.sf.ehcache.store.heap.RestartableSelectableConcurrentHashMap$RestartableSegment.preRemove(RestartableSelectableConcurrentHashMap.java:278) at net.sf.ehcache.store.chm.SelectableConcurrentHashMap$Segment.remove(SelectableConcurrentHashMap.java:694) at net.sf.ehcache.store.chm.SelectableConcurrentHashMap.remove(SelectableConcurrentHashMap.java:429) at net.sf.ehcache.store.MemoryStore.remove(MemoryStore.java:331) at net.sf.ehcache.store.FrontEndCacheTier.remove(FrontEndCacheTier.java:314) at net.sf.ehcache.Cache.removeInternal(Cache.java:2335) at net.sf.ehcache.Cache.tryRemoveImmediately(Cache.java:2118) at net.sf.ehcache.Cache.elementStatsHelper(Cache.java:2079) at net.sf.ehcache.Cache.searchInStoreWithoutStats(Cache.java:2073) at net.sf.ehcache.Cache.getQuiet(Cache.java:1863) at com.renxo.cms.job.CacheStatisticsGatherer.execute(Unknown Source) at com.renxo.cms.process.JobMonitoredProcess$ProcessThread.run(Unknown Source) Caused by: java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:267) at net.sf.ehcache.store.offheap.portability.AbstractEhcachePortability.readObjects(AbstractEhcachePortability.java:72) at net.sf.ehcache.store.offheap.portability.EhcacheKeyPortability.decode(EhcacheKeyPortability.java:51) ... 20 more

Comments

Manuel Dominguez Sarmiento 2012-10-02

My gut feeling is that when the cache finds an expired element, and tries to remove it immediately, but said element entry has perhaps already been purged or is no longer present in the persistent store, then it fails.

We can work around this issue by wrapping calls to cache.get() and cache.getQuiet() to catch CacheExceptions, but these shouldn’t be happening in the first place.

Fiona OShea 2012-10-02

Targeting for Gladstone+ while we review what is going on. Need a BMGo + as well I think

Tim Wu 2012-10-02

Manuel,

We’ve located the bug and are working on a fix. It relates to key hash collisions.

In the meantime, it you should be able to workaround this issue by using a restartable cache with offheap enabled. Is that something you can try out?

Manuel Dominguez Sarmiento 2012-10-02

Thanks for the quick feedback. Unfortunately our production systems are finely tuned regarding memory usage, and configuring any significant off-heap capacity is out of the question (memory is already fully consumed by the heap). However, we can try allocating a small off-heap (say, about 32-64 MB) and configure overflowToOffHeap=”true” maxBytesLocalOffHeap=”32M”

Would this help fix the issue in the meantime? Or does the off-heap capacity need to be enough to fit everything in the cache to avoid this bug?

Tim Wu 2012-10-02

Unfortunately that wouldn’t work. With offheap + heap configured, offheap will contain the full content of the cache while heap will contain a subset of what’s in offheap. In other works setting offheap size to 32M would constrain your cache size to 32M.

The problem is caused specifically by key hash collisions, is there anything in your app code that might be increasing the number of collisions? Not really an ideal workaround, but if you could guarantee the hash codes are unique (maybe just return an unique id), that would avoid the issue.

From our end, we’ve already got a fix in, and we’re currently trying to see how best to distribute it.

Manuel Dominguez Sarmiento 2012-10-03

We have hundreds of cache zones and hashcodes are generated mostly with HashCodeBuilder from Apache Commons (except for JDK classes of course), so I don’t think that hashcode quality is a problem here.

Any chance you could forward a patch for us to try out? We’re using the latest BigMemory Go release with Ehcache 2.6.1 - or we could patch it ourselves if the fix is simple enough.

Thanks.

Fiona OShea 2012-10-03

Please don’t send anything until PM chimes in.

Tim Wu 2012-10-03

The fix is already in. Guess we just need some opinion from PM on how to distribute this and any further fixes?

Not sure who is responsible for that. Can you reassign as necessary?

Fiona OShea 2012-10-04

we will discuss with PM on Friday morning

Manish Devgan 2012-10-08

(for Terracotta Internal) uploaded the snapshot build at: ftp://extranet.terracottatech.com//BMGo-renxo/bigmemory-go-3.7.2-21157.tar.gz – sent a note to Manuel @ Renxo (cc’ed Gautam)

Kumar Reddy 2013-01-26

We are getting similar exception with Java String as cache keys. we are using plain ehcahe with diskpersistent=”true” and eternal=”true”.

Version : BigMemory G0 3.7.2

net.sf.ehcache.CacheException: Failure while decoding key java.nio.HeapByteBuffer[pos=0 lim=0 cap=0]         at net.sf.ehcache.store.offheap.portability.EhcacheKeyPortability.decode(EhcacheKeyPortability.java:55)         at net.sf.ehcache.store.offheap.portability.EhcacheKeyPortability.equals(EhcacheKeyPortability.java:60)         at com.terracottatech.offheapstore.storage.PortabilityBasedStorageEngine.equalsKey(PortabilityBasedStorageEngine.java:108)         at com.terracottatech.offheapstore.OffHeapHashMap.keyEquals(OffHeapHashMap.java:914)         at com.terracottatech.offheapstore.OffHeapHashMap.put(OffHeapHashMap.java:466)         at com.terracottatech.offheapstore.OffHeapHashMap.put(OffHeapHashMap.java:408)         at com.terracottatech.offheapstore.AbstractLockedOffHeapHashMap.put(AbstractLockedOffHeapHashMap.java:121)         at com.terracottatech.offheapstore.concurrent.AbstractConcurrentOffHeapMap.put(AbstractConcurrentOffHeapMap.java:188)         at net.sf.ehcache.store.offheap.disk.OffHeapDiskStore.put(OffHeapDiskStore.java:129)         at net.sf.ehcache.store.FrontEndCacheTier.put(FrontEndCacheTier.java:261)         at net.sf.ehcache.Cache.putInternal(Cache.java:1459)         at net.sf.ehcache.Cache.put(Cache.java:1387)         at net.sf.ehcache.Cache.put(Cache.java:1352)         at in.co.test.cache.Cache.put(Cache.java:47) Caused by: java.io.EOFException         at java.io.DataInputStream.readByte(DataInputStream.java:261)         at net.sf.ehcache.store.offheap.portability.AbstractEhcachePortability.readObjects(AbstractEhcachePortability.java:72)         at net.sf.ehcache.store.offheap.portability.EhcacheKeyPortability.decode(EhcacheKeyPortability.java:51)         … 19 more

Fiona OShea 2013-01-28

Created EHC-991 to track the comment most recently added