• Bug
  • Status: Closed
  • 2 Major
  • Resolution: Fixed
  • Communications Layer
  • etsai
  • Reporter: etsai
  • February 03, 2009
  • 0
  • Watchers: 0
  • July 27, 2012
  • February 05, 2009

Description

Reported on forum. http://forums.terracotta.org/forums/posts/list/0/1746.page#10487

The active-passive internally bases on group-comm which maintains connections among L2s and only one connection from one L2 to other L2. That connection is same as L1 to L2 connection, however, only one exists. For example, it can be either active-to-passive or passive-to-active but just one of them. At starting up, it can be two connections between them but a logic implemented to close one of them. For his problem, the active was acting like a client/L1 but somehow it didn’t connect back to passive in time, same behavior as normal client-to-server disconnection. However, group-comm will try to setup another connection from passive-to-active to maintain group communication.

Comments

Erh-Yuan Tsai 2009-02-03

This was caused by L2-reconnect enabled and L2, acting like L1, failed to connect back to peer L2 within the specified time period. Without L2-reconnect, it drop all disconnections and start a whole new connection.

Below is user’s tc.proterties which enabled the L2-reconnect

ehcache.concurrency = 64 l2.nha.tcgroupcomm.reconnect.enabled=true l2.nha.tcgroupcomm.reconnect.timeout=15000 l2.l1reconnect.enabled=true l2.l1reconnect.timeout.millis=15000

Erh-Yuan Tsai 2009-02-03

The good thing with L2-reconnect enabled will smooth the short connect disruptions, otherwise nodes join and leave, trigger unwanted active-passive elections in the bad network environment. The bad thing is what happened on customer’s site, infinite try to make a connection.

I’m thing about a simple fix like this if L2-reconnect enabled then set 10 tries , instead of infinite tries. The 10 can be configurable.

Erh-Yuan Tsai 2009-02-05

Do not use ClientConnectionEstablisher for L2 reconnecting. OOO has its own reconnection mechansim.