• Bug
  • Status: Resolved
  • 2 Major
  • Resolution: Fixed
  • ehcache-core
  • ljacomet
  • Reporter: zwierz
  • July 08, 2016
  • 0
  • Watchers: 4
  • January 05, 2017
  • January 05, 2017

Description

We’ve been experiencing some weird behaviour with different configurations of EhCache 2.7.5 and it seems to be reproducible on latest 2.10.2 as well.

We use Cache.getWithLoader() with given cache loader in multi-threaded environment.

The cache loader is definitely thread safe, I tested it on cache loader which does nothing but returning a static string value.

When this method is highly contended, like when it takes a lot of requests from 8+ threads, it happens to return null.

There’s the part responsible for loading value from my cache loader:

} else {
    value = loadValueUsingLoader(key, loader, loaderArgument);
}
if (value != null) {
    put(new Element(key, value), false);
}
[...]
return getQuiet(key);

Value is computed, put in cache, but when second getQuiet is called null is returned.

Perhaps there is certain timing issue between putting entry in cache and then fetching it right away.

Is there a reason why value is not being returned right after pushing it into cache like in example below, without the second call to getQuiet?

Element newElement = new Element(key, value);
put(newElement, false);
return newElement;

Comments

Tomasz Zwierzchowski 2016-07-12

Could you guys comment on this issue?

We think the fix is reasonable, but we might be missing something.

Thanks in advance for your help!

Fiona OShea 2016-07-12

Louis can you or another dev review this issue? thanks

Louis Jacomet Jacomet 2016-07-13

Hi Tomasz,

This is a nasty concurrency issue. I would be interested in understanding how the cache is sized when you see this problem?

The problem in Ehcache 2.x is the combination of features. If we return the element as you suggest, we break potential {{copyOnWrite}} and/or {{copyOnRead}} semantics. And doing the {{getQuiet}} solves that problem but indeed opens up a race against eviction.

After discussion the issue with others, I believe an improvement is possible, where we return the loaded value in case the {{getQuiet}} misses. It still has a potential impact on {{copyOnWrite}} semantics where the loading thread and any other thread that manages to see the value will end up with a different instance. But I believe that since on return the value is no longer in cache, we can live with that.

Tomasz Zwierzchowski 2016-07-14

I’m not sure if it actually happens during eviction, because value is freshly generated, put into cache, so it still has a lot of TTL ahead.

I was considering situation where we put computed value into cache and when we ask getQuiet to retrieve it right away the value is not there yet - do writes happen asynchronously in certain situations?

Tomasz Zwierzchowski 2016-07-28

Could you provide us with some rough estimates when this issue is going to be fixed in the official release?

We’re considering forking 2.10.2 version with mentioned patch, but it never comes without a cost of maintaining :)

Thanks in advance for info

Louis Jacomet Jacomet 2016-08-05

Hi Tomasz,

{quote} I was considering situation where we put computed value into cache and when we ask getQuiet to retrieve it right away the value is not there yet - do writes happen asynchronously in certain situations? {quote} No, that is not possible AFAICT.

{quote} Could you provide us with some rough estimates when this issue is going to be fixed in the official release? {quote} 2.10.3 should be out mid October.

Tomasz Zwierzchowski 2016-08-05

Thanks for info. I’m looking forward to see 2.10.3 out there! :)

Markus S 2016-08-23

We see the same issue that getWithLoader() returns null in our environment when there is high concurrency on the cache instance, which should not be possible at all because the cache loader never returns null but instead returns a predefined element. Is there any alternate approach for this use case?

Tomasz Zwierzchowski 2016-08-24

Markus: We’ve been using patch mentioned in the issue description. It temporarily solves the null problem as long as you’re not using copy-on-write/copy-on-read semantics.

Louis Jacomet Jacomet 2017-01-05

[~zwierz] Could you validate that the released 2.10.3 resolves the issue for you? Thanks!