Unecessary synchonisation point in tim-tomcat SessionValve55.java affecting all inbound requests

Type: Bug
Status: Closed
Priority: 2 Major
Resolution: Fixed
Component/s: Integration Modules
Labels:

Assignee: teck
Reporter: dconnard
Created: September 08, 2009
Votes: 1
Watchers: 3
Updated:
July 27, 2012
Resolved:
September 08, 2009

Description

We’ve recently upgraded a production installation of Terracotta to TC3.1.0 to get around some VM synchronisation issues that had been fixed in later releases.

One key synchronisation issue that we upgraded for was the addition of a local cache (called clusteredApps) to TerracottaSessionManager.isDsoSessionApp(). Prior to the cache, this method ended up in a synchronised block, which internally was doing reflection and class loading. The real problem was that this method is called in the setup of every single Tomcat request, regardless of whether or not TC should be enabled for that request - triggering a synchronisation bottleneck for every inbound request, sessionless or otherwise.

The addition of the local cache has alleviated this particular synchronisation point. Thanks for that..! However, unfortunately, it has now shifted a little further down the stack. See line 74 of SessionValve55.java:

http://svn.terracotta.org/svn/forge/projects/tim-tomcat/trunk/tim-tomcat-5.5/src/main/java/org/terracotta/modules/tomcat/tomcat_5_5/SessionValve55.java

There is a synchronized block around a HashMap.get() call.

This code should be changed to use lazy synchronisation for the .put() case only, via the addition of a get prior to obtaining the synchronisation lock (as there is no need to synchronise on HashMap.get() alone) - ie:

private SessionManager findOrCreateManager(Request valveReq, String contextPath) { String hostName = valveReq.getHost().getName(); String key = hostName + contextPath;

SessionManager rv = null;
rv = (SessionManager) mgrs.get(key);
if (rv == null) {
  synchronized (mgrs) {
    rv = (SessionManager) mgrs.get(key);
    if (rv == null) {
      rv = createManager(valveReq, hostName, contextPath);
      mgrs.put(key, rv);
    }
  }
}
return rv;   \}

We immediately observed numerous waits on this synchronisation block after deploying TC3.1.0. After some frantic patching, we are now running the above code in production, and are no longer seeing threads blocking at this point.

This is an extremely simple and safe patch to return to the core. It’s also an extremely important one, as anything that forces all inbound request threads to synchronise is obviously very bad in a highly loaded production environment such as ours.

Comments

Alex Miller 2009-09-08

Thanks for the report - we’ll consider a fix for this for 3.1.1.

This particular fix is not actually safe as you are doing an unsynchronized get() on a shared map. That misses all of the visibility and concurrency requirements otherwise. For example, if you do an unlocked get() while a put() happens to do a rehash, the get() can experience an infinite loop which will hang your system.

I think we would prefer instead to make this map more concurrent (but still safe) by using a ConcurrentHashMap for example.

Alex Miller 2009-09-08

If you see something that feels safe for 3.1.1, that would be ok, but otherwise we’ll table for Santiago or 3.1.x.

Tim Eck 2009-09-08

I agree that this synchronization point could be a bottleneck. However I do not agree that it is safe to perform the get() outside of the synchronization as indicated.

A correct fix (with the desired concurrency properties) would be to use ConcurrentHashMap and putIfAbsent().

Tim Eck 2009-09-08

resolved in http://svn.terracotta.org/fisheye/changelog/TerracottaForge/?cs=18665

David Connard 2009-09-08

Hmm… OK, good point about put() calls being able to trigger infinite loops in concurrent get() calls. However, I’m not sure that ConcurrentHashMap is the best solution in this particular case… ConcurrentHashMaps are a fair amount of overhead compared to regular HashMaps, and in this case, there will only ever be a handful of updates in the first 100ms of VM operation (as a manager is created for the handful of webapps configured in the VM), and then the contents will be static forever after. It’s not really worth the continuing overhead of the CoHashMap for what is essentially a static data cache… and that’d be why our (heavily loaded) servers haven’t run into this.

Can I suggest you consider an alternate implementation which doesn’t use a ConcurrentHashMap, but instead creates an updated copy and then re-assigns the cache, eg. something along the lines of the following. This would have better ongoing runtime performance than the ConcurrentHashMap.

* change mgrs back to be a HashMap, but non-final. * add private final Object mgrsUpdateSemaphore = new Object(); * adjust findOrCreateManager to be:

private SessionManager findOrCreateManager(Request valveReq, String contextPath) { String hostName = valveReq.getHost().getName(); String key = hostName + contextPath;

SessionManager rv = null;
rv = (SessionManager) mgrs.get(key);
if (rv == null) {
  synchronized (mgrsUpdateSemaphore) {
    rv = (SessionManager) mgrs.get(key);
    if (rv == null) {
      rv = createManager(valveReq, hostName, contextPath);
      Map<String, SessionManager> newMgrs = new HashMap<String, SessionManager>();
      newMgrs.putAll(mgrs);
      newMgrs.put(key, rv)
      mgrs = newMgrs;
    }
  }
}
return rv;   \} 

Tim Eck 2009-09-08

For that proposed code to be safe from a memory model perspective, the “mgrs” map would have to be volatile. I don’t suspect it would be much faster than a CHM.get() which performs a volatile read as well (at least for its normal path)

This code *might* be removed all together at some point when the session support for tomcat is refactored. Ideally we should be implementing the session manager abstraction (http://tomcat.apache.org/tomcat-6.0-doc/api/org/apache/catalina/Manager.html) with a terracotta implemenation. The Manager instance is initialized at context startup so we can do away with this lazy init and the synchronization problems it introduces

David Connard 2009-09-08

OK - CoHashMap it is then, sounds like it’s the best bomb-proof solution. I may update our patched tim at some point.

I watched our production servers closely last night, and after clearing this synchronisation bottleneck by that patch, there is one last one bottleneck apparent that I think can be avoided. It is this stack trace:

java.util.Hashtable.get java.util.Calendar.setWeekCountData java.util.Calendar. java.util.GregorianCalendar. java.util.Calendar.createCalendar java.util.Calendar.getInstance java.text.SimpleDateFormat.initializeCalendar java.text.SimpleDateFormat. org.apache.catalina.connector.Request.(Request.java:156) org.apache.catalina.connector.SessionRequest55.(SessionRequest55.java:35) org.terracotta.modules.tomcat.tomcat\_5\_5.SessionValve55.tcInvoke(SessionValve55.java:59) org.terracotta.modules.tomcat.tomcat\_5\_5.SessionValve55.invoke(SessionValve55.java:50) org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:354) java.util.concurrent.ThreadPoolExecutor$Worker.runTask java.util.concurrent.ThreadPoolExecutor$Worker.run java.lang.Thread.run

There were loads of threads blocking up around the Hashtable synchronisation point in Calendar. Why is that synchronisation point being reached..? Purely because org.apache.catalina.connector.SessionRequest55 extends org.apache.catalina.connector.Request, and is triggering it’s field initialisation as a result of construction going off.

Isn’t SessionRequest55 just an adapter? Does it have to extend (and thus trigger construction of a new Request)?

David Connard 2009-09-08

Hmmm… I’ve dig into the code a little further, and I don’t think there’s anything you can easily do about this one. You’re required to extend org.apache.catalina.connector.Request by the Valve interface. Perhaps a runtime proxy could achieve the desired result of providing a valid org.apache.catalina.connector.Request object but without re-initialisation of the super class and it’s SimpleDateFormat objects, but runtime proxying itself is slow, and is unlikely to ultimately improve things.

I do think this issue is worthy of further attention however. Of the approximately 300 or so threads that I observed last night contending on this synchronisation point, 260+ or so were from the above stack, and the remainder were from miscellaneous other points in our code where we use SimpleDateFormats, and importantly, this included JDBC driver code where these were being used in result processing, etc. That is, this particular synchronisation point could impact performance anywhere in the VM, including inside database transactions, simply due to a high inbound request load coupled with the presence of the TC request valve in the stack.

I’m not sure why I’m not seeing the original Request object construction (by Tomcat) showing up in my stack traces though. I’m only ever observing the TC-constructed one. Is Tomcat perhaps re-using request objects or something? Not sure.

David Connard 2009-09-08

Right… yes, Tomcat is recycling the request objects as part of the Http11Processor pool. It’s not creating a new request object for each inbound request - presumably partly because of this particular concurrency issue. Putting the TC request valve into the stack means that this is no longer the case, and new instances of Request objects are being constructed for every single inbound request.

I guess that the correct solution then is to take a similar approach (to Tomcat) and pool the org.apache.catalina.connector.SessionRequest55 request objects - ie. create a pool in the org.terracotta.modules.tomcat.tomcat_5_5.SessionValve55 class, and recycle them back to the pool after each request is completed.

Unfortunately this is not such a simple patch as the last one… but, I would argue, is an important one to try to maintain the performance of Tomcat servers with TC enabled vs. those without, especially given the ability of synchronisation points on the SimpleDateFormat class to affect other parts of the VM.

It’s possible that the same approach needs to be taken with the SessionResponse55 objects, as these are similarly part of the Tomcat pool and recycle concept. However, there’s no clear synchronisation reason apparent (to me) at this stage, and the only obvious impact is the memory utilisation of the repeated superclass construction (which includes a few arrays and bits and pieces).

Tim Eck 2009-09-09

Thanks for all the food for thought here David. I don’t find any faults with your analysis :-)

For me this is just further argument for letting tomcat do it’s job with regards to request/response creation and handling and only have TC involved in the session manager. Of course there isn’t any timeline for that work at the moment.