• Bug
  • Status: Closed
  • 2 Major
  • Resolution: Fixed
  • drb
  • Reporter: teck
  • September 08, 2009
  • 0
  • Watchers: 0
  • November 25, 2009
  • November 02, 2009

Attachments

Description

There is a subtle race in DistributedCacheImpl.invalidateCacheEntries(). This is the code with some noise removed:

protected void invalidateCacheEntries(final Iterator keys) \{ while (keys.hasNext()) \{ K key = keys.next(); TimestampedValue wrappedValue = data.get(key); if (wrappedValue == null) continue;

int now = getTime();
if (wrappedValue.isExpired(now, config)) {
  evict(key, wrappedValue, now);
}   \} \}

The entry retrieved from the get() above might have been already replaced (with a fresh new non-expired item) before we evaluate it and potentially remove it.

Seems like we should be either be using lockEntry() around the shebang there to remove the race, or better yet use the two argument remove(Object,Object) on the underlying map to only remove the value if it has not been replaced

Comments

Abhishek Singh 2009-10-26

Fixed in rev-19229 in trunk and rev-19230 in tc-3.1 branch.

Tim Eck 2009-10-26

We can create a new item if you want (rather than re-opening this one), but at the team meeting this morning we agreed that we’d prefer to use the two argument remove() to fix this

Abhishek Singh 2009-10-28

Fixed to used 2 argument remove() in rev-19253 + added support for ConcurrentMap in DistributedCacheImpl. Rev-19254 for tc-3.1

Abhishek Singh 2009-10-29

Ran h2lcperf with TTI set to 2 mins (so that invalidation happens) before/after changes:

Before: 2009-10-29 04:01:20,614 INFO [AbstractPerfTest] - <——- FINAL REPORT ——– > 2009-10-29 04:01:20,615 INFO [AbstractPerfTest] - <Cluster TPS: 255.7> 2009-10-29 04:01:20,615 INFO [AbstractPerfTest] - <Node TPS: 32.4> 2009-10-29 04:01:20,753 INFO [AbstractPerfTest] - <Cluster Min Latency: 0, Cluster Max Latency: 15113, Cluster Avge Latency: 1551>

After: 2009-10-29 02:44:38,520 INFO [AbstractPerfTest] - <——- FINAL REPORT ——– > 2009-10-29 02:44:38,521 INFO [AbstractPerfTest] - <Cluster TPS: 258.0> 2009-10-29 02:44:38,522 INFO [AbstractPerfTest] - <Node TPS: 32.4> 2009-10-29 02:44:38,719 INFO [AbstractPerfTest] - <Cluster Min Latency: 0, Cluster Max Latency: 16671, Cluster Avge Latency: 1536>

No significant change in perf, in fact improvement of 2.3 tps :-P

Abhishek Singh 2009-10-29

Attached read performance graph during test before/after changes for one node. Same performance.

Abhishek Singh 2009-11-02

This is fixed (using 2 argument remove).

Himadri Singh 2009-11-25

DistributedMapTestApp.java covers this. Test is been running on monkeys and is not failing.