• Bug
  • Status: Resolved
  • 2 Major
  • Resolution: Duplicate
  • ehcache-terracotta
  • cdennis
  • Reporter: gadams00
  • March 01, 2013
  • 0
  • Watchers: 5
  • June 06, 2013
  • March 06, 2013

Attachments

Description

See forum posting here: http://forums.terracotta.org/forums/posts/list/7962.page

In a nutshell, when caching a large/deep object graph using ehache 2.6.5 with a distributed cache via terracotta open source (terracotta version 3.7.4) on windows 7 x64 and with Oracle/Sun JDK 1.6.0_38, the calculated size of the object changes as reported immediately after the initial put and as reported after any subsequent get. The size reported after the put is equivalent to the size of the file created via ObjectOutputStream, and the size after the get is roughly 3x the serialized size.

I’ve noticed that this only applies when using a string as the cache key. If I switch to an integer, the sizes are consistent.

I’ve created a reproduction test case maven project, which is attached. To use this, start a terracotta 3.7.4 instance running on localhost. Just extracting the opensource terracotta-3.7.4.tar.gz to a directory and running start-tc-server.bat to start with the default configuration is sufficient. Next, extract the attached terracottabug.zip to a directory, cd into it, and run “mvn test”. Test failure demonstrates the bug.

Comments

Greg Adams 2013-03-01

Forgot to mention that there are shared references marked with @IgnoreSizeOf at play here as well.

Chris Dennis 2013-03-06

Long story, short: the behavior you thought was faulty was correct, and the behavior you thought was correct was faulty.

Long story, long:

The value you’re storing in the cache is basically a very complex object graph linking a very large number of strings. Strings happen to serialize very efficiently - in the heap the component characters are represented as 16-bit quantities, but when serialized, assuming you’re using mainly ASCII characters the serialization will compress each char down to 8-bits. So a simple string is half as big when serialized as it is on heap. The upshot of this is that your large object is only 9MB when serialized - but is something around 26MB when in the heap. Some of this inflation is the strings, the rest is the object overhead (since the graph is pretty complex).

When stuff is stored in a clustered cache it’s serialized before being sent to the server. So when it’s sent to the server it’s only 9MB in size. For various internal technical reasons we keep this internal serialized form in the local cache until someone reads the value. Only then is the value deserialized. This is why when using String keys you see 9MB when putting and it inflates to the 26MB when you do the get - this behavior (albeit slightly confusing) is the correct behavior.

It turns out when using integer keys there is bug in the code that prevents the value’s size from being recalculated after we’ve done the deserialization. This is why the Integer keyed cache doesn’t appear to change in size. I’ve filed an issue for the underlying bug you are seeing with Integer keys.