EHC ❯ Pathological access patterns can cause starvation of the MemoryStore
-
Bug
-
Status: Closed
-
1 Critical
-
Resolution: Fixed
-
ehcache-core
-
-
cdennis
-
Reporter: cdennis
-
December 07, 2009
-
0
-
Watchers: 1
-
February 24, 2010
-
January 20, 2010
Description
The MemoryStore eviction algorithm uses a circular buffer the size of the store to cache keys from which a suitable eviction candidate can be selected. Unfortunately if a key is overwritten in the circular buffer before the cache element is evicted from the MemoryStore (and the cache is eternal), then that element is then stuck in the cache forever.
The following test illustrates an access pattern that triggers this behavior (admittedly a slightly artificial one):
public class StarvedMemoryStoreTest extends AbstractCacheTest {
private final int CAPACITY = 1000;
@Test
public void testStarvingMemoryStore() {
Cache cache = new Cache("starvedCache", CAPACITY, false, true, 0, 0);
manager.addCache(cache);
//prefill cache
for (int i = 0; i < CAPACITY; i++) {
cache.put(new Element(Integer.valueOf(i), new Object()));
}
//now do the bad things
for (int i = CAPACITY; i < 100*CAPACITY; i++) {
cache.get(Integer.valueOf(i - CAPACITY));
System.err.println("Putting " + i);
cache.put(new Element(Integer.valueOf(i), new Object()));
}
List keys = cache.getKeys();
Collections.sort(keys);
System.err.println("Cache contains : " + keys);
} \}
Comments
gluck 2009-12-11
gluck 2009-12-11
The current probabilistic eviction algorithm does not provide the same results as the old LRU. For that reason, java -Dnet.sf.ehcache.use.classic.lru=true is provided that will use the old LruMemoryStore.
If an implementation can be done which equals or beats
/** * Orig. * INFO: Average Get Time: 0.37618342 ms * INFO: Average Put Time: 0.61346555 ms * INFO: Average Remove Time: 0.43651128 ms * INFO: Average Remove All Time: 0.20818481 ms * INFO: Average keySet Time: 0.11898771 ms * <p/> * CLHM * INFO: Average Get Time for 3611277 observations: 0.0043137097 ms * INFO: Average Put Time for 554433 obervations: 0.011824693 ms * INFO: Average Remove Time for 802361 obervations: 0.008200797 ms * INFO: Average Remove All Time for 2887862 observations: 4.685127E-4 ms * INFO: Average keySet Time for 2659524 observations: 0.003155828 ms * <p/> * CHM with sampling * INFO: Average Get Time for 5424446 observations: 0.0046010227 ms * INFO: Average Put Time for 358907 obervations: 0.027190888 ms * INFO: Average Remove Time for 971741 obervations: 0.00924732 ms * INFO: Average keySet Time for 466812 observations: 0.15059596 ms * <p/> * After putting back synchronized: * <p/> * INFO: Average Get Time for 7184321 observations: 0.009596036 ms * INFO: Average Put Time for 15853 obervations: 0.117264874 ms * INFO: Average Remove Time for 385518 obervations: 0.017298803 ms * INFO: Average Remove All Time for 456174 observations: 0.10433519 ms * INFO: Average keySet Time for 4042893 observations: 0.0029669348 ms * INFO: Total loads: 123 * * @throws Exception */ @Test public void testConcurrentReadWriteRemoveLRU() throws Exception { testConcurrentReadWriteRemove(MemoryStoreEvictionPolicy.LRU);
and can solve these issues it may be a better solution.
Steve Harris 2009-12-11
Just so I understand. The current Impl has a memory leak?
Chris Dennis 2009-12-11
No Greg is talking about the increase in heap usage of MemoryStore between Ehcache 1.5 and 1.6.
The issue I’m talking about is related to MemoryStores becoming starved of space due to what you might call zombie elements that will never be removed by the capacity eviction algorithm.
gluck 2009-12-11
One more thing. There is a test in LruMemoryStoreTest as follows. The probalistic evictor will evict > 230 out of 250 of the true LRU. The old LruMemoryStore would get the 250 exactty right. This currently causes a few issues for people. Any new implementation should do better than the current one in accuracy of LRU eviction.
/**
* Test the LRU policy
*/
@Test
public void testProbabilisticEvictionPolicy() throws Exception {
createMemoryOnlyStore(MemoryStoreEvictionPolicy.LRU, 500);
//Make sure that the store is empty to start with
assertEquals(0, cache.getSize());
// Populate the store till the max limit
Element element = new Element("key1", "value1");
for (int i = 0; i < 500; i++) {
cache.put(new Element("" + i, "value1"));
}
Thread.sleep(1010);
for (int i = 0; i < 500; i++) {
cache.get("" + i);
}
//evict some
for (int i = 501; i < 750; i++) {
cache.put(new Element("" + i, "value1"));
}
int lastPutCount = 0;
for (int i = 501; i < 750; i++) {
if (cache.get("" + i) != null) {
lastPutCount++;
}
}
assertTrue("Ineffective eviction algorithm. Less than 230 of the last 249 put Elements remain: " + lastPutCount, lastPutCount >= 230);
}
gluck 2009-12-17
See http://forums.terracotta.org/forums/posts/list/0/2866.page#16675 for someone who ran into a problem where the last added element is getting evicted, which can happen with the probablistic evictor if all else fails.
Chris Dennis 2010-01-20
ehcache-core now has a new MemoryStore implementation that select keys directly from the underlying map, thus preventing any kind drift in a parallel key data structure.
Jon Christiansen 2010-02-17
In 1.7.2’s MemoryStore, TOO_LARGE_TO_EFFICIENTLY_ITERATE = 5, so the above statement above should really say for everyone who’s cache is greater than 5 elements this is going to be an issue. Right?
Also, the above statement “The old LruMemoryStore would get the 250 exactty right” really should be “The old LruMemoryStore would get the 249 exactly right” since that loop will only run 249 times.
Also, for such a significant issue, wouldn’t it be prudent to set the “Affects Version/s” field for this bug, so others experiencing this will have a better chance at actually finding this good information about whats going on.
Lastly, I still have to dig into and digest the code a bit more, but if someone could comment on exactly under what conditions a key is overwritten in the circular buffer before the cache element is evicted – if someone already understands the circumstances and has those details, that would a great help. Is there something that is not thread safe thats causing this to happen? Running that testProbabilisticEvictionPolicy on my Mac laptop comes out with different results every time.
Thanks, JC
Chris Dennis 2010-02-17
This could potentially affect anyone using 1.6.0-1.7.2.
Chris Dennis 2010-02-17
The list of keys that can be evicted from the cache is a circular buffer the same size as the cache. Consider an initially empty cache with maximum capacity N. The first N elements will fill the circular buffer. Once this has happened both the buffer and the store are full. The next put into the store will evict an entry from the map and overwrite a key in the buffer. The key in the buffer that is overwritten is known. It will be the first key put into the store. However the key evicted from the store is selected on a partly-random basis (least useful key from a sample of 15). If the key evicted from the store isn’t the the same key as is overwritten from the buffer then the two sets no longer match up.
Example:
Buffer: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Store: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
Put in 10 - eviction algorithm picks key 4 as the least useful
Buffer: 10, 1, 2, 3, 4, 5, 6, 7, 8, 9 Store: 0, 1, 2, 3, 5, 6, 7, 8, 9, 10
The 0 key is still in the store but is no longer in the buffer. Because it is not in the buffer it cannot be selected for future capacity eviction. In the absence of external removal of the key (either by TTI/TTL expiry or explicit user removal) the 0 key will sit in the buffer indefinitely - effectively reducing the useful store capacity by 1.
The test method in the bug description illustrates an example of when this can happen.
Why does Ehcache 1.6 use more memory than 1.5? ConcurrentHashMap does not provide an eviction mechanism. We add that ourselves. For caches larger than 5000 elements, we create an extra ArrayList equal to the size of the cache which holds keys. This can be an issue with larger keys. An optimisation which cache clients can use is: