• Bug
  • Status: Open
  • 1 Critical
  • Resolution:
  • ehcache-core
  • eng group
  • Reporter: lgilbert
  • April 26, 2013
  • 0
  • Watchers: 5
  • May 14, 2013

Description

Under heavy read/write load we get the following trace:

2013-04-26 16:48:12,776 ERROR [com.pfmclient.trans.AbstractTCMSQL] (TRNRunner-5-t-11) net.sf.ehcache.CacheException: Failed while decoding element java.nio.HeapByteBufferR[pos=870563 lim=870568 cap=870568]
(TRNRunner-5-t-11) [AbstractPortfolioData{id=184712, code=190532, currency=NZD, 32}] SQLExceptionException in getTransactions() Failed while decoding element java.nio.HeapByteBufferR[pos=870563 lim=870568 cap=870568]: net.sf.ehcache.CacheException: Failed while decoding element java.nio.HeapByteBufferR[pos=870563 lim=870568 cap=870568]
	at net.sf.ehcache.store.offheap.portability.EhcacheElementPortability.decode(EhcacheElementPortability.java:125) [:]
	at net.sf.ehcache.store.offheap.portability.EhcacheElementPortability.decode(EhcacheElementPortability.java:29) [:]
	at com.terracottatech.offheapstore.storage.PortabilityBasedStorageEngine.readValue(PortabilityBasedStorageEngine.java:93) [:]
	at com.terracottatech.offheapstore.OffHeapHashMap.remove(OffHeapHashMap.java:727) [:]
	at com.terracottatech.offheapstore.AbstractLockedOffHeapHashMap.remove(AbstractLockedOffHeapHashMap.java:165) [:]
	at com.terracottatech.offheapstore.AbstractOffHeapClockCache.remove(AbstractOffHeapClockCache.java:293) [:]
	at com.terracottatech.offheapstore.concurrent.AbstractConcurrentOffHeapMap.remove(AbstractConcurrentOffHeapMap.java:206) [:]
	at net.sf.ehcache.store.offheap.cachingtier.OffHeapCachingTier.remove(OffHeapCachingTier.java:117) [:]
	at net.sf.ehcache.store.CacheStore.put(CacheStore.java:138) [:]
	at net.sf.ehcache.Cache.putInternal(Cache.java:1500) [:]
	at net.sf.ehcache.Cache.put(Cache.java:1426) [:]
	at net.sf.ehcache.Cache.put(Cache.java:1391) [:]

Comments

Lee Gilbert 2013-04-26

Environment is 64bit: Java version 1.6.0_33, vendor Sun Microsystems Inc. CentOS release 6.3. 2.6.32-358.0.1.el6.x86_64 Using 16 virtualized cores.

Lee Gilbert 2013-04-26

Further down on the trace there is also

Caused by: java.io.OptionalDataException
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348) [:1.6.0_33]
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350) [:1.6.0_33]
	at java.util.ArrayList.readObject(ArrayList.java:593) [:1.6.0_33]
	at sun.reflect.GeneratedMethodAccessor510.invoke(Unknown Source) [:1.6.0_33]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [:1.6.0_33]
	at java.lang.reflect.Method.invoke(Method.java:597) [:1.6.0_33]
	at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969) [:1.6.0_33]
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848) [:1.6.0_33]
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752) [:1.6.0_33]
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328) [:1.6.0_33]
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946) [:1.6.0_33]
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870) [:1.6.0_33]
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752) [:1.6.0_33]
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328) [:1.6.0_33]
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350) [:1.6.0_33]
	at net.sf.ehcache.store.offheap.portability.AbstractEhcachePortability.readSerializables(AbstractEhcachePortability.java:114) [:]
	at net.sf.ehcache.store.offheap.portability.AbstractEhcachePortability.readObjects(AbstractEhcachePortability.java:90) [:]
	at net.sf.ehcache.store.offheap.portability.EhcacheElementPortability.decode(EhcacheElementPortability.java:99) [:]
	... 159 more

Fiona OShea 2013-04-26

Can we get this reproduced in house?

Lee Gilbert 2013-04-29

Additional info on cache config:


    <defaultCache
            statistics="false"
            maxEntriesLocalHeap="1000"
            eternal="false"
            overflowToDisk="false"
            diskPersistent="false"
            copyOnRead="false"
            diskExpiryThreadIntervalSeconds="60"
            memoryStoreEvictionPolicy="LFU"
            overflowToOffHeap="false"
            />
...
   <cache name="trans.current.cache"
            statistics="false" 
            maxEntriesLocalHeap="100"
            eternal="false"
            copyOnRead="false"
            memoryStoreEvictionPolicy="LFU"
            overflowToOffHeap="true"
            maxBytesLocalOffHeap="4G"
            > 
        <persistence strategy="localTempSwap"/>
    </cache>
..

Approximately 75,000 entries, 100 in memory, the rest divided between off heap & localTempSwap (on ramdisk). Loading was upto 200 random entry writes per minute, during which sequential queries were executed retrieving upto 30,000 entries; only 1 query allowed running at a time.

Himadri Singh 2013-05-02

Can we get more information about the type of object being stored in ehcache.

By query, we mean we are just reading back the data from ehcache. What kind of keys are being used here.

How frequently we can reproduce this. Does it happen only with ramDisk.

Lee Gilbert 2013-05-06

  1. The objects being stored: A container-class containing 3 fields: . an ArrayList of Serializable TransactionData objects; . lastTS - a ByteArray (last updated timestamp from SQLServer) . an int - modification count.

  2. Cache key & Update Details: Key is a simple Integer (portfolioID). Since the cache entries are updated by separate threads, a ConcurrentHashMap of juc Lock objects keyed by portfolioID is used to prevent concurrent modification of cache entries retrieved from cache. For load test, random transactions are selected for update from the db.
    The cache-entry containing the transaction is retrieved and the comment field of the transaction updated. This process is repeated using 8 threads.

  3. Frequency. The problem does not reproduce using onHeap ehCache, only with offheap. It appears only to occur during cached-value deserialization.

I am working on a standalone test case to try to reproduce the problem, since in production we run ehCache under JBoss 6.0 with large pages. Note: For the test of offheap I also disabled both: system swap and transparent-huge-pages to prevent high CPU.

Lee Gilbert 2013-05-08

It appears that moving Serialization/Deserialization outside of ehCache makes this problem go away. So this may be a satisfactory fix for us.

Fiona OShea 2013-05-08

HImadri do you think we need to look into this?

Himadri Singh 2013-05-08

We dont have enough data to reproduce the issue. It seems to be related to object they are using without which we wont be able to reproduce.

Fiona OShea 2013-05-08

Lee Will your workaround resolve your issue?

Lee Gilbert 2013-05-09

Yes, Serialization/De-serialization outside of ehCache prevents the problem occurring, with a slight performance hit.