In a multi-site
configuration, GemFire XD distributes table DML operations between remote
distributed systems, with a minimum impact on each system's performance. DML
events are distributed only for the tables that you configure for replication
by using a gateway receiver.
The DML event contains the originating site's distributed system ID,
and the ID of each site to which it sends the event. This information ensures
that no site re-forwards the event to a site that has already received the
event. When a GemFire XD site receives an event for a table DML operation, it
forwards the event to the other sites it knows about (sites that correspond to
configured gateway senders for the table), but it excludes those sites that it
knows have already received the event for the table.
A DML event does not record the ID of the sites that receive and
apply the DML event. This means that unsupported WAN topologies can result in
more than one copy of an event being applied to a single site. This can lead to
duplicate data or failed DML execution in the table, and must be avoided. See
the discussion of supported and unsupported topologies under
How a Gateway Processes Its Queue
Each primary gateway sender contains a processor thread that reads
messages from its queue, batches them, and distributes them to the connected
site. To process the queue, the thread takes the following actions:
- Reads messages from the
- Creates a batch of the
- Synchronously distributes
the batch to the remote site and waits for a reply.
- Removes the batch from the
queue after the remote site has successfully replied.
Because the batch is not removed from the queue until after the remote
site has replied, the message cannot get lost. However, this also means that a
message can be processed more than once in certain failure scenarios. If a site
goes offline in the middle of processing a batch of messages, that same batch
is sent again after the site is back online. See
About High Availability for WAN Deployments.
The batch is configurable via the batch size and batch time interval
settings, which you can specify when you create a gateway sender (see
A batch of messages is processed by the queue when either the batch size or the
time interval is reached. In an active network, it is likely that the batch
size will be reached before the time interval. In an idle network, the time
interval will most likely be reached before the batch size. This may result in
some network latency that corresponds to the time interval.
How a Gateway Handles Batch Processing Failure
These types of exceptions can occur during batch processing:
- A gateway receiver
fails with acknowledgment. If processing fails while the receiving gateway
is processing a batch, it replies with a failure acknowledgment containing the
exception, including the identity of the message that failed, and the ID of the
last message successfully processed. The sending gateway removes the
successfully processed messages and the failed message from the queue and logs
the exception and the failed message information. The sender then continues
processing the messages remaining in the queue.
- The receiving gateway
fails without acknowledgment. If the receiving gateway does not acknowledge
a sent batch, the sender does not know which messages were successfully
processed, so it re-sends the entire batch.
- No remote gateways are
available. If a batch processing exception occurs because no remote
gateways are available, the batch remains on the queue. The sending gateway
waits for a time, then attempts to re-send the batch. The time periods between
attempts is five seconds. The existing server monitor continuously attempts to
connect to the remote gateway, so eventually a connection is made and queue
processing continues. Messages build up in the queue and possibly overflow to
disk in the meantime.