Solve common problems may be encountered using GemFire XD.
After upgrading a GemFire XD member, you may receive a ClassNotFoundException when starting a member, referencing package names that begin with com.vmware.sqlfire or com.pivotal.sqlfire. These exceptions occur if you fail to stop and remove older procedure or listener implementations that use package names from the SQLFire product. Stop and remove these objects before upgrading to GemFire XD. See Performing a Manual Upgrade Using the RPM Distribution or Performing a Manual Upgrade Using the ZIP Distribution for more information..
GemFire XD Locator pid: 23537 status: waiting Waiting for DataDictionary (DiskId: 531fc5bb-1720-4836-a468-3d738a21af63, Location: /pivotal/locator2/./datadictionary) on: [DiskId: aa77785a-0f03-4441-84f7-6eb6547d7833, Location: /pivotal/server1/./datadictionary] [DiskId: f417704b-fff4-4b99-81a2-75576d673547, Location: /pivotal/locator1/./datadictionary]Here, the startup messages indicate that locator2 is waiting for the persistent datadictionary files on locator1 and server1 to become available. GemFire XD always persists the data dictionary for indexes and tables that you create, even if you do not configure those tables to persist their stored data. The startup messages above indicate that locator1 or locator2 might potentially store a newer copy of the data dictionary for the distributed system.
Starting GemFire XD Server using locators for peer discovery: localhost,localhost Starting network server for GemFire XD Server at address localhost/127.0.0.1 Logs generated in /pivotal/server1/gfxdserver.log The server is still starting. 15 seconds have elapsed since the last log message: Region /_DDL_STMTS_META_REGION has potentially stale data. It is waiting for another member to recover the latest data. My persistent id: DiskStore ID: aa77785a-0f03-4441-84f7-6eb6547d7833 Name: Location: /10.0.1.31:/pivotal/server1/./datadictionary Members with potentially new data: [ DiskStore ID: f417704b-fff4-4b99-81a2-75576d673547 Name: Location: /10.0.1.31:/pivotal/locator1/./datadictionary ] Use the "gfxd list-missing-disk-stores" command to see all disk stores that are being waited on by other members.The data store startup messages indicate that locator1 has "potentially new data" for the data dictionary. In this case, both locator2 and server1 were shut down before locator1 in the system, so those members are waiting on locator1 to ensure that they have the latest version of the data dictionary.
The above messages for data stores and locators may indicate that some members were not started. If the indicated disk store persistence files are available on the missing member, simply start that member and allow the running members to recover. For example, in the above system you would simply start locator1 and allow locator2 and server1 to synchronize their data.
If you receive a ConflictingPersistentDataException during startup, it indicates that you have multiple copies of some persistent data and GemFire XD cannot determine which copy to use. Normally GemFire XD uses metadata to automatically determine which copy of persistent data to use. Each member persists, along with the data dictionary or table data, a list of other members that have the data and whether their data is up to date.
A ConflictingPersistentDataException happens when two members compare their metadata and find that it is inconsistent—they either don’t know about each other, or they both believe that the other member has stale data. The following are some scenarios that can cause a ConflictingPersistentDataException.
Trying to merge two independently-created distributed systems into a single distributed system causes a ConflictingPersistentDataException. There are a few ways to end up with independently-created systems:
Trying to merge independent systems by pointing all members to the same set of locators then results in a ConflictingPersistentDataException.
GemFire XD cannot merge independently-created data for the same table. Instead, you need to export the data from one of the systems and import it into the other system. See Exporting and Importing Data with GemFire XD.
Starting new members first
Starting a brand new member with no persistent data before starting older members that have persistent data can cause a ConflictingPersistentDataException.
This can happen by accident if you shut down the system, then add a new member to the startup scripts, and finally start all members in parallel. In this case, the new member may start first. If this occurs, the new member creates an empty, independent copy of the data before the older members start up. When the older members start, the situation is similar to that described above in “Independently-created copies.”
In this case, the fix is simply to move aside or delete the (empty) persistence files for the new member, shut down the new member, and finally restart the older members. After the older members have fully recovered, restart the new member.
A network split, with enable-network-partition-detection set to false
With enable-network-partition-detection set to true, GemFire XD detects a network partition and shuts down members to prevent a "split brain." In this case no conflicts should occur when the system is restored.
However, if enable-network-partition-detection is false, GemFire XD cannot prevent a "split brain" after a network partition. Instead, each side of the network partition records that the other side of the partition has stale data. When the partition is healed and persistent members are restarted, they find a conflict because each side believes the other side's members are stale.
In some cases it may be possible to choose between sides of the network partition and keep only the data from one side of the partition. Otherwise you may need to salvage data and import it into a fresh system.
Resolving a ConflictingPersistentDataException
If you receive a ConflictingPersistentDataException, you will not be able to start all of your members and have them join the same distributed system.
First, determine if there is one part of the system that you can recover. For example, if you just added some new members to the system, try to start up without including those members. For the remaining members, use the data extractor tool to extract data from the persistence files and import it into a running system. See Recovering Data from Disk Stores.
These are common problems that occur when connecting to a GemFire XD distributed system:
You receive SQL State 08001 Error: 'Failed after trying all available servers: '
This problem can be caused if you specify null values for the username and password connection properties in the JDBC connection URL. Some third-party tools specify automatically supply null values but include the connection properties if you do not specify user credentials.
If authentication is disabled in your distributed system, then you can specify any temporary user name and password value when connecting. Connecting to GemFire XD with JDBC Tools provides more details.
In WAN deployments, tables may fail to synchronize between two GemFire XD distributed systems if the tables are not identical to one another (see Create Tables with Gateway Senders). If you have configured WAN replication between sites but a table fails to synchronize because of schema differences, follow these steps to correct the situation:
It is important to monitor the disk usage of GemFire XD members. If a member lacks sufficient disk space for a disk store, the member attempts to shut down the disk store and its associated tables, and logs an error message. After you make sufficient disk space available to the member, you can restart the member. (See Member Startup Problems.) A shutdown due to a member running out of disk space can cause loss of data, data file corruption, log file corruption and other error conditions that can negatively impact your applications.
You can prevent disk file errors using the following techniques:
Pre-allocation is governed by the following system properties:
Monitor GemFire XD logs for low disk space warnings. GemFire XD logs disk space warnings in the following situations:
You can configure the log message frequency with the gemfire.DISKSPACE_WARNING_INTERVAL system property.
If a member of your GemFire XD distributed system fails due to a disk full error condition, add or make additional disk capacity available and attempt to restart the member normally. If the member does not restart and there is a redundant copy of its tables in a disk store on another member, you can restore the member using the following steps: