Handling Forced Disconnection

A GemFire XD member may be forcibly disconnected from a distributed system if it is unresponsive for a period of time, or if a network partition separates one or more members into a group that is too small to act as the distributed system. If you start GemFire XD using the FabricService interface, you can use callback methods to perform actions during the reconnect process, or to cancel the reconnect process if necessary.

After being disconnected from a distributed system a GemFire XD member automatically shuts down and then restarts into a "reconnecting" state, while it periodically attempts to rejoin the distributed system. If it succeeds in reconnecting, the member rebuilds its view of the distributed system from existing members, and it receives a new distributed system ID. When a locator is in the reconnecting state, it provides no discovery services for the distributed system. GemFire XD datastore members can use the FabricService.startNetworkServer() method to start a network server independently of the FabricServer instance; as a best practice you should use FabricService.waitUntilReconnected(long, TimeUnit) to start a Network Server only after the member has reconnected to the distributed system.

Note: Automatic reconnect is supported by members that you start with the gfxd utility or the FabricService interface. GemFire XD peer clients do not automatically reconnect to the distributed system after a forced disconnect, because the JDBC peer client connection is closed.

By default a GemFire XD member will try to reconnect to the distributed until the maximum number of attempts is made (the max-num-reconnect-tries property) or until it is told to stop using the FabricService.stopReconnecting() method. You can configure the amount of time that GemFire XD waits between reconnection attempts using the max-wait-time-reconnect property. You can disable automatic reconnection entirely by setting disable-auto-reconnect to "true."

The FabricService API provides several methods you can use to take actions while a member is reconnecting to the distributed system:

See the GemFire XD API reference for more information.