CDV ❯ Cache.getKeys() returns a list of serialized Objects as Strings
-
Bug
-
Status: Closed
-
1 Critical
-
Resolution: Fixed
-
Integration Modules
-
-
cdennis
-
Reporter: ayreon
-
January 13, 2010
-
0
-
Watchers: 6
-
March 24, 2011
-
December 07, 2010
Attachments
Description
In TC - EhCache (identity mode and in serialization mode as well) cache.getKeys() returns the serialized version of previously inserted complex keys (e.g. a Foo instance as key),
e.g. after storing a complex object key, cache.getKeys() returns a String ([seems like the output of a serialization]), and on the top of that i can’t deserialize that String to restore my originally entered key, because as i see
org.terracotta.cache.serialization.DsoSerializationStrategy
uses its own to serialize it, so the original key cannot be restored
Another problem: Even in identity mode the keys in EHCache still needs to be Serializable (i will open another issue for this)
Versions: EHCache 1.7.2 (same problem with 1.7.1) TC 3.2.0 (same problem with 3.1.1) tim-ehcache-1.7 1.5.0 (and all its dependencies)
Comments
Steve Harris 2010-01-13
Tim Eck 2010-01-13
Fixing things so we don’t serialize the keys for identity caches should probably fix this without any extra effort
Chris Dennis 2010-01-15
I’ve solved the related CDV-1445 issue which meant that identity caches were not using identity (non-serialized) keys. This means that getKeys will now return the correct keys (the exact same keys that were originally put into the cache) when the cache is in identity mode.
Solving this issue for serialization mode caches is a little more complicated. Since the EhCache API returns the keys as an array and not any kind of collection, returning the correct key set in serialization mode would involve deserializing all the keys when getKeys is called which could be quite expensive. This would be fine if it weren’t for the fact that there is no value or entry collection getter for an EhCache instance. This means the only way to access all the values of a cache is by getting the key set and then doing a get for every key. I’m reluctant therefore to force a deserialization for each key on getKeys since this would mean people wishing to access the values of the cache would pay the cost of both a key deserialization and a key serialization for every value they need to access.
Tim Eck 2010-01-15
Short of an interface change to ehcache, it kinda feels like we should throw UnsupportedOperationException in this case instead of returning unexpected types
Chris Dennis 2010-01-15
I’m reluctant to do that because currently the following kind of poor mans entry/value iteration does actually work for a serialization mode clustered cache (I haven’t tested this but I’m 90+% certain):
for (Object key : cache.getKeys()) { Element value = cache.get(key); System.err.println(“Value: “ + value); }
Tim Eck 2010-01-15
I think you’re probably right about that code working (since generateStringKey() for a String instance just returns it). Although I’m still not sure if that usage is enough to warrant leaving getKeys() to return unexpected types. I’m sure we can someone to decide for us :-)
Janos Biro 2010-01-16
Probably i’m not the one who will decide it at the end :), but let me share my opinion as reporter,
First of all i think we should decide that a parameter called “valueMode” in a configuration file can affect the way we handling the keys or not…, I think it shouldn’t, from user point of view at least, Of course you can do anything you want with your keys internally (in serialization mode you may serialize them), but imho it’s simply unacceptable and will surely cause serious killer bugs in PROD environments (as it already happened to our system), if you silently change the way how getKeys() and probably more importantly key comparison (hashCode() + equals()) used to work before enabling TC clustered behaviour in EHCache.
On the top of that the “serialization” mode is the default,
so suppose somebody follows your Guide and simply put a
the killer thing is that at first glance it seems everything went fine, the caches are working (more or less) according to what Dev Console shows, and only a deeper analyis shows that a bunch of your caches do not work at all, or element invalidation does not work in some cases.
e.g When a key’s hashCode and equals do not consider all the member fields, but you consider them (comparing serialized forms) when cache.get(key) an element from the cache, it immediately leads to unexpected behaviour in an existing app. Of course users can make those non-cosidered member fields as “transient” (as I did), but why would anybody do such a thing? Who would expect that a “value(!)Mode” setting affects his key by any means? And what if equals() contains special logic which cannot be simply mapped to the serializied form marking fields transient? Practically all EHCache based applications should be revised/refactored before enabling TC clustering.
Regarding getKeys() the threat is similiarly serious, in our app we used it to get all the keys and select the ones (in a for loop) to be invalidated after (triggered by a quite complex event when only a determined subset of keys need to be invalidated). As we use several types as keys we check with an “instanceOf” if the returned key is e.g “keyClass1” or ““keyClass2”. After enabling TC clustering, these methods silently stopped working since getKeys returned Strings…
So i’m pretty sure that leaving this behaviour (CDV-1444 + CDV-1446) as it is now, is not an option,
As I see now we have the following options imho:
(the minimal, quick and dirty, i believe it’s an almost unacceptable concept)
- Document this behaviour as it is now and ask the users to revise/refactor their code according to the new manual ( in serialization mode )
- do not rely on hashCode and equals anymore but serilialized form comparision (truth is i’ve never seen similiar behaviour but imagine that for now :) )
- state that keys must be serializable (in the current docs i couldn’t find such statement for keys, but only for values)
- expect UnsupportedOperationException from getKeys (i think it is the only way it could work, if i put a Person object as key, getKeys() cannot return with something else, otherwise it would lead to an interface i never seen before in my life :) ) and use the new e.g. getSerializedKeys() to get well.. the serialized keys :)
- provide a method to restore the original key from the serialized format (it cannot be restored now)
- Fix these bugs in serialization mode as well and getKeys(), hashCode+equals() will work exactly as in standard EHCache
- i understand your arguments around performance drawbacks but i think if there is something which is more important than the performance in case of a cache implementation is that the interface and the behaviour should be consistent (in custered/non-clustered, identity/serilaition mode or whatever), clear, and work according to the Java standards
- introduce Cache.getSerializedKeys() which returns the current serialized keys
- introduce Cache.getWithSerializedKey() beside .get(), which ignores hashCode()+equals() but uses the current comparison method
- provide a method to restore the original key from the serialized format (it cannot be restored now)
- if somebody wants leverage this performance boost coming from the current solution, ask him to use the new methods being aware how they work
I know that this solution needs refactoring if somebody wants the same performance boost we have now, but i think it’s much more acceptable to ask users to refactor their cache handler code to gain some performance boost, than ask them to refactor and test their entire code base (since making fields “transient” e.g. may affect the entire codebase) to be compatible with a theoritically transparent clustering solution.
What do you think?
Steve Harris 2010-01-21
Seems like the solution for this is going to need a bit more thought. Moving to taraval
Gary Keim 2010-08-02
While working on a rudimentary Browse panel for the Ehcache DevConsole panel, I ran into this issue. The attached changes make it work for my purposes.
Robert Hathaway 2010-10-15
Making key fields transient is not an option for many apps. What if the key equals method uses those fields in determining a match? We need this for paging, where a page in the cache can match many potential key ‘get’s with various ranges.
Cache’s must implement the Java map interface, to ignore equals() is simply wrong. My implementation can switch among many cache options and when moving to T.C. it simply breaks, not matching. Provide some configurable option to optimize when possible (serialized key matching), but as others state T.C. must support equals on a cache get.
I’m just now looking into this issue and I’m hoping with 3.3 this bug can be fixed, otherwise I’m not sure how T.C. is usable as a distributed cache. I just updated to a 3.3.
Gary Keim 2010-11-15
I totally agree with Janos and Robert on the priority of correctness over performance.
Right now, you can’t use the Ehcache-monitor to view the keys of a clustered cache in serialization value-mode, which is the default value-mode.
Can do something about this for the Freo beta?
Can take a look at this and tell me if it’s broken?