• Bug
  • Status: Resolved
  • 2 Major
  • Resolution: Duplicate
  • ehcache
  • drb
  • Reporter: docodan
  • April 09, 2014
  • 0
  • Watchers: 4
  • April 09, 2014
  • April 09, 2014

Description

Segments are not just used privately by SelectableConcurrentHashMap. They can be explicitly locked outside of map operations. There are two places relevant to this bug where segments are locked:

1) In AbstractReadWriteEhcacheAccessStrategy.putFromLoad, the segment is locked for the object we are writing.

2) Internal to SelectableConcurrentHashMap, when remove() is called, the segment is locked for the object we are removing.

Note that cache size maintenance happens after the segment has been locked for putFromLoad.

So here is the deadlock scenario. Consider two objects being put into the cache by different threads when the cache is full. Assume each object hashes to a different segment.

Thread 1: Set a write-lock on segment A so that we can be added to that segment. Thread 2: Set a write-lock on segment B so that we can be added to that segment. Thread 1: Determine that the cache needs to be resized, and choose to remove an object that resides in segment B. Since the lock is held by Thread 2, the attempt to acquire the lock blocks. Thread 2: Determine that the cache needs to be resized, and choose to remove an object that resides in segment A. Since the lock is held by Thread 1, the attempt to acquire the lock blocks.

This results in deadlock. Because the cache is accessed by most threads that perform a Hibernate operation, each will block in turn when they access either of those segments.

Here’s where the “put” lock happens in code. The cache maintenance happens in region.put::

public final boolean putFromLoad(Object key, Object value, long txTimestamp, Object version, boolean minimalPutOverride) throws CacheException { region.writeLock(key); try { Lockable item = (Lockable) region.get(key); boolean writeable = item == null || item.isWriteable(txTimestamp, version, versionComparator); if (writeable) { region.put(key, new Item(value, version, region.nextTimestamp())); return true; } else { return false; } } finally { region.writeUnlock(key); } }

Here’s a snippet of stack trace where you can see the cache maintenance (with a corresponding lock attempt) occurring:

        at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807)
        at net.sf.ehcache.store.chm.SelectableConcurrentHashMap$Segment.remove(SelectableConcurrentHashMap.java:563)
        at net.sf.ehcache.store.chm.SelectableConcurrentHashMap.remove(SelectableConcurrentHashMap.java:393)
        at net.sf.ehcache.store.MemoryStore.remove(MemoryStore.java:359)
        at net.sf.ehcache.store.MemoryStore.evict(MemoryStore.java:878)
        at net.sf.ehcache.store.MemoryStore.removeElementChosenByEvictionPolicy(MemoryStore.java:608)
        at net.sf.ehcache.store.MemoryStore.access$500(MemoryStore.java:80)
        at net.sf.ehcache.store.MemoryStore$Participant.evict(MemoryStore.java:1049)
        at net.sf.ehcache.pool.impl.BalancedAccessEvictor.freeSpace(BalancedAccessEvictor.java:93)
        at net.sf.ehcache.pool.impl.AtomicPoolAccessor.add(AtomicPoolAccessor.java:71)
        at net.sf.ehcache.pool.impl.AbstractPoolAccessor.add(AbstractPoolAccessor.java:67)
        at net.sf.ehcache.store.MemoryStore.put(MemoryStore.java:263)
        at net.sf.ehcache.Cache.putInternal(Cache.java:1554)
        at net.sf.ehcache.Cache.put(Cache.java:1480)
        at net.sf.ehcache.Cache.put(Cache.java:1445)
        at net.sf.ehcache.hibernate.regions.EhcacheTransactionalDataRegion.put(EhcacheTransactionalDataRegion.java:141)
        at net.sf.ehcache.hibernate.regions.EhcacheTransactionalDataRegion.put(EhcacheTransactionalDataRegion.java:126)
        at net.sf.ehcache.hibernate.strategy.AbstractReadWriteEhcacheAccessStrategy.putFromLoad(AbstractReadWriteEhcacheAccessStrategy.java:93)

Here’s evidence from our jstack data that two threads are blocking on each other in a deadlock:

Thread A:

“pool-25-thread-8” prio=10 tid=0x00002aace53e3000 nid=0x5faf waiting on condition [0x000000004c1a6000..0x000000004c1a9a10] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aaafbea5888> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) …

Locked ownable synchronizers: - <0x00002aaafb034908> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) - <0x00002aaafd209e80> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

Thread B:

“pool-25-thread-10” prio=10 tid=0x00002aacf99a8800 nid=0x5fb2 waiting on condition [0x000000004c4aa000..0x000000004c4acd90] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aaafb034908> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)

Locked ownable synchronizers: - <0x00002aaafbea5888> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) - <0x00002aaafde2a6c0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

Comments

Alexander Snaps 2014-04-09

See EHC-1079 While this is much more detailed. Thanks a lot for the detailed description!