About High Availability for WAN Deployments

You can deploy a gateway sender to multiple data stores to provide high availability.

At any given time, only one member has an active gateway sender process, called the primary sender, for propagating DML to a remote WAN site. All other gateway senders of the same name are treated as secondary senders. Secondary senders remain in standby mode for redundancy, to guarantee replication to the remote WAN site if the member with the primary sender fails.

A GemFire XD cluster needs only a single gateway receiver to receive DML events from remote distributed systems. However, as with gateway senders, you can deploy multiple instances of the gateway receiver to different GemFire XD members for high availability.

Gateway senders can also be configured to use an available disk store to persist queued DML operations that need to be replicated to a remote cluster. Using a disk store ensures that queued DML operations are not lost, even if you shut down the GemFire XD member that hosts the primary sender.

Configuring for Consistency and Redundancy

To ensure high availability and reliable delivery of DML events, configure each gateway sender for both persistence and redundancy. For performance reasons, deploy no more than one secondary gateway sender (redundancy of one).

If a peer or server joins an existing WAN-enabled distributed system, then WAN components, such as gateway senders and gateway receivers, are automatically created as part of DDL replay. This replay is subject to each new GemFire XD member satisfying the criteria of the server group. For example, if you deploy a gateway receiver to a particular server group, and you later add a member to that server group, the receiver will be created on the new member as well.

Possible Side Effect of Failover Process

A side effect of the gateway sender failover process is that a DML operation may be reapplied to the remote WAN site if the primary sender fails. If the member that hosts the primary sender fails while sending a batch of operations, some DML statements in the batch may have already been applied to the remote WAN site. After a failover, the new primary sender sends the failed batch in its entirety, and reapplies the initial DML operations. Because of this, the remote WAN site may become out-of-synch depending upon the nature of the DML operation; how the DML modifies table columns; and the presence or absence of column constraints on the target table.

For example, consider the following update statement:
update trade.networth set availloan=availloan-? where cid = ? and tid= ?
If a failure occurs mid-way through applying this DML statement, reapplying the statement decreases the "availloan" value twice in rows that applied the update before the failure.