New LRU eviction policy is slower than the old LruMemoryStore

Type: Bug
Status: Closed
Priority: 2 Major
Resolution: Not a Bug
Component/s: ehcache-core
Labels:

Assignee: cdennis
Reporter: dhulley
Created: March 16, 2010
Votes: 0
Watchers: 1
Updated:
July 15, 2010
Resolved:
June 17, 2010

Attachments

dhulley (932.0) text/plain EHCache-LRU-test-results.txt
dhulley (1.00 k) text/plain EHCache-old-LRU.txt
dhulley (7.00 k) application/zip EHCachePerfTest.zip

Description

Old style LRU (integrated LruMemoryStore) was much faster than new separate policy-driven implementation.

This has been verified with fine-grained profiling as well. In short, the new method of sampling reduces performance over the old implementation … significantly. Also, profiling shows a large amount of time being spent in Random and TimeUtil.

A snippet from the attached file:

** With old LRU (-Dnet.sf.ehcache.use.classic.lru=true) **

Cache performance test: count: 1 direct: 22489 ns\count transaction: 735994 ns\count …

** With new LRU: **

Cache performance test: count: 1 direct: 31848 ns\count transaction: 7263347 ns\count …

Comments

Chris Dennis 2010-03-16

Hi Derek,

Could you attach a copy of the source code for your performance test so that I could have a look at it? I’m slightly intrigued by your results. If you’re uncomfortable posting source code here for whatever reason, then a description of what the test does would probably be a good substitute.

Thanks,

Chris (Ehcache and Terracotta Developer)

Derek Hulley 2010-03-16

Hi, Chris With pleasure; I’ll extract it to a rawer form (remove dependencies). Derek (Alfresco Software)

Derek Hulley 2010-03-16

I’ve attached a zipped-up Eclipse project. To run you will need to import: ehcache-core-1.7.2.jar lib/slf4j-api-1.5.11.jar slf4j-log4j12-1.5.11.jar (or equivalent) log4j-1.2.15.jar (if using slf4j-log4j*) Switching between LRU implementations is done in the code. Results attached:

** New LRU ** Cache performance test: count: 100000 direct: 12198 ns\count

** Old LRU ** Cache performance test: count: 100000 direct: 5652 ns\count

Chris Dennis 2010-03-19

I read your code, and did some comparisons using my own performance tests between the old LruMemoryStore and the new MemoryStore in the current trunk of Ehcache (effectively Ehcache 2.0.0). Below are the numbers that I get for the two implementations for put, threshold put (put when the store is at capacity and must evict), and get. The test were run on a quad-cpu box, single is 1 thread, multi is 4 threads (1 per cpu), and over is 16 threads (4 per cpu).

Put: Single: OLD:0.0119125ms NEW:0.0149455ms Multi : OLD:0.0399765ms NEW:0.071277ms Over : OLD:0.185791ms NEW:0.3426435ms

Threshold Put: Single: OLD:0.0150445ms NEW:0.0284205ms Multi : OLD:0.0654905ms NEW:0.140666ms Over : OLD:0.5047135ms NEW:0.804368ms

Get: Single: OLD:0.0134945ms NEW:0.012991ms Multi : OLD:0.0236025ms NEW:0.018558ms Over : OLD:0.1179035ms NEW:0.069667ms

The results I see here confirm what your test showed which is that single threaded put performance is higher in the old LRU store than in the new store. Get performance is however better in the new store (especially in a multi-threaded environment). Since in a typical cache you expect gets to vastly outnumber puts, and that a multi-threaded scenario is more common than a single-threaded scenario, the pay-off in get performance is worth the cost in put performance.

I did however come across some parts of the existing store that could be optimized to perform better. As I am currently in the process of rewriting both the memory store and disk store to support unclustered JTA caching I’ll make sure to include these performance improvements in the new store. I’m going to leave this JIRA issue open to track the relative performance of the old LRU versus the store implementation that I am currently writing. As such I’m going to retarget it to some same timeframe.

Chris Dennis 2010-06-17

Single threaded access will indeed be as quick if not quicker in the strict LRU data structure. A large performance penalty is however paid when everything becomes multithreaded. Closing this a “Not a bug” since this is expected behavior. Performance in the newly designed unclustered store is now faster than the previous generation where I could identify some optimizations (especially in threshold puts).