How the Data Recovery Utilities Work
The dataextractor utility operates against a set of available GemFire XD operational log files (disk stores) in order to extract data to CSV files and provide recommendations for how to best restore the data in a new distributed system. A dataextractloader utility takes the CSV file output along with a recommendations file, and uses those inputs to load the recovered data into a new GemFire XD system.
|Extract log||extract.log||Full log output for the data extraction process.|
|File summary||Summary.txt||Specifies a complete list of all SQL and CSV file names that were created during the extraction process.|
|Recovery recommendations||Recommended.txt||Specifies the absolute path of all SQL and CSV files that the utility recommends using for loading into a GemFire XD system, in the order that they should be loaded. This represents the "best guess" selection of content that will recover the most data from the available files.|
|DDL files.||exported_ddl.sql||DDL files that can be replayed to create the recovered database
All DDL files are created in a subdirectory for each server being recovered.
|Recovered data files||
|These files contain the recovered data that you can use with dataextractloader to restore the data in a new distributed system.
CSV files are created per table. Partitioned table filenames begin with PR, and
replicated table files begin with RR. A partitioned table generates one CSV file
per bucket, and the bucket number is present in the filename. All CSV filenames
include a timestamp.
All CSV files are created in a subdirectory for each server being recovered.
After running dataextractor, you can choose to use the recommendations file as-is, or edit the file to recover only a portion of the available data. You then run dataextractloader with the recommendations file to load the data into a new GemFire XD system.
Limitations for Data Recovery
The dataextractor utility provides only a "best effort" attempt to recover disk store data. Keep these limitations in mind when you use the utilities
- The dataextractor utility uses available data to determine which member held the latest version of persistent data. This determination cannot be accurate in all situations.
- Data recovered by the utility may not be complete or consistent for the database schema.
- Recovered data may be incomplete (missing data or operations). This may be unavoidable if the disk store that holds the latest version of some data is corrupt and the data cannot be extracted.
- Duplicate entries in recovered data may violate unique key constraints when you load the recovered data to a new system.
- Missing data in parent-child tables may violate foreign key constraints when you load the recovered data to a new system.
- The dataextractor utility does not recover any index information.
- Although the dataextractor recovers events that are persisted to HDFS queues, you cannot load these HDFS events to a new GemFire XD system using the dataextractloader utility.
The following procedures and resources are required in order to use the data recovery utilities.
- You must shut down all members of the system that you want to recover, before you use the dataextractor utility.
- The machine on which you run the dataextractor utility cannot run a GemFire XD member (locator, datastore, or accessor) for any other GemFire XD distributed system.
- Make a copy of any disk store file that you want to use for recovering data, and run the
dataextractor utility against your copies of those files.Note: The disk store files that contain the persistent data dictionary are required in order to use the dataextractor utility. Each locator and data store member of the distributed system persists the data dictionary in the /datadictionary subdirectory of the member working directory.
- You must allocate enough heap memory to the dataextractor utility, using the -Xmx argument. The amount of memory required is equal to the size of the largest GemFire XD member that you are recovering. For example, if the system that you are recovering had three datastore members with 30 GB, 25 GB, and 15 GB of memory, then you must allocate 30 GB of memory to the dataextractor utility process. Additional heap memory is required if you execute the dataextractor utility using multiple threads.
- The machine on which you run dataextractor must have twice the amount of disk space available than the total size of the disk store files specified in the input properties file.
Procedure for Recovering Data from Disk Stores
Follow these steps to extract available data from available GemFire XD disk store files.
- Shut down all members of the system that you want to recover. For
$ gfxd shut-down-all -locators=localhost Connecting to distributed system: locators=localhost Successfully shut down 2 members $ gfxd locator stop -dir=$HOME/locator The GemFireXD Locator has stopped.
- Shut down any other GemFire XD members that may be running on the local machine (for example, locators or data stores that run as part of another GemFire XD distributed system).
- Make a copy of all disk store files that you want to use for recovery. At a minimum,
this involves copying the member working directory for locators and data stores in your
$ mkdir ~/recovery-directory $ cp -r ~/locator ~/recovery-directory $ cp -r ~/server1 ~/recovery-directory $ cp -r ~/server2 ~/recovery-directoryNote: If you created disk store files outside of a member's working directory, ensure that you make a copy of those disk store files as well.
- Move to the directory containing the copied disk store files, and create the
$ cd ~/recovery-directory $ touch extractor.properties
- Use a text editor edit the extractor.properties file. Enter a
GemFire XD member definition on each line, with the first value corresponding to the
pathname of the member's working directory. For
recoveredlocator1=/Users/yozie/recovery-directory/locator recoveredserver1=/Users/yozie/recovery-directory/server1 recoveredserver2=/Users/yozie/recovery-directory/server2Note: If you created disk store files outside of a member's working directory, ensure that you specify the directory location of those disk store files in a comma-separated list following the member working directory, as in:
- Set the JAVA_ARGS environment variable to allocate the required heap space (see Requirements). For
$ export JAVA_ARGS=-Xmx2G
- Execute the dataextractor utility, specifying the properties file that
$ dataextractor property-file=./extractor.properties Reading the properties file : ./extractor.properties Total size of data to be extracted : 14.4404296875MB Disk space available in the output directory : 30423.44921875MB Sufficient disk space to carry out data extraction Extracting DDL for server : recoveredserver1 Extracting DDL for server : recoveredlocator1 Extracting DDL for server : recoveredserver2 Completed extraction of DDL's for server : recoveredlocator1 Completed extraction of DDL's for server : recoveredserver1 Completed extraction of DDL's for server : recoveredserver2 NULL ROW FORMATTER FOR:SYSIBMSYSDUMMY1 Maximum disk-store size on disk 5.057651519775391 MB Available memory : 52.09442901611328 MB Estimated memory needed per server : 11.12683334350586 MB Recommended number of threads to extract server(s) in parallel : 4 Started data extraction for Server : recoveredlocator1 Started data extraction for Server : recoveredserver1 Started data extraction for Server : recoveredserver2 Extracting disk stores Extracting disk stores Server : recoveredlocator1 Attempting extraction of diskstore:GFXD-DEFAULT-DISKSTORE from directory: /Users/yozie/recovery-directory/locator Extracting disk stores Server : recoveredserver2 Attempting extraction of diskstore:GFXD-DEFAULT-DISKSTORE from directory: /Users/yozie/recovery-directory/server2 Server : recoveredserver1 Attempting extraction of diskstore:GFXD-DEFAULT-DISKSTORE from directory: /Users/yozie/recovery-directory/server1 Completed extraction of diskstore:GFXD-DEFAULT-DISKSTORE from directory: /Users/yozie/recovery-directory/locator Completed extraction of diskstore:GFXD-DEFAULT-DISKSTORE from directory: /Users/yozie/recovery-directory/server2 Completed extraction of diskstore:GFXD-DEFAULT-DISKSTORE from directory: /Users/yozie/recovery-directory/server1 Total Salvage Time : 15.851s Writing out Summary and Recommendation... Completed Summary and RecommendationNote: See dataextractor for a full description of additional command-line options.Output from the utility is stored in two subdirectories of the working directory, named EXTRACTED_FILES and datadictionary. For example:
$ ls EXTRACTED_FILES/ datadictionary/ extractor.properties locator/ server1/ server2/The Summary.txt file is stored in the last extracted files directory:
$ cat EXTRACTED_FILES/Summary.txt [DDL EXPORT INFORMATION] 1. recoveredlocator1 , file : /Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredlocator1/exported_ddl.sql Number of ddl statements : 9 2. recoveredserver1 , file : /Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredserver1/exported_ddl.sql Number of ddl statements : 9 3. recoveredserver2 , file : /Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredserver2/exported_ddl.sql Number of ddl statements : 9 [EXPORT INFORMATION FOR TABLES] Table:APP_FLIGHTS__B__APP_FLIGHTS_36 1. /Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredserver1/PR-APP-FLIGHTS-_B__APP_FLIGHTS_36-1400537865621.csv . Number of rows extracted : 3 Table:APP_FLIGHTS__B__APP_FLIGHTS_37 1. /Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredserver1/PR-APP-FLIGHTS-_B__APP_FLIGHTS_37-1400537866233.csv . Number of rows extracted : 5 Table:APP_FLIGHTS__B__APP_FLIGHTS_34 1. /Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredserver2/PR-APP-FLIGHTS-_B__APP_FLIGHTS_34-1400537866329.csv . Number of rows extracted : 5 Table:APP_FLIGHTS__B__APP_FLIGHTS_35 1. /Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredserver1/PR-APP-FLIGHTS-_B__APP_FLIGHTS_35-1400537865998.csv . Number of rows extracted : 5 Table:APP_FLIGHTS__B__APP_FLIGHTS_38 1. /Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredserver1/PR-APP-FLIGHTS-_B__APP_FLIGHTS_38-1400537866224.csv . Number of rows extracted : 11 Table:APP_FLIGHTS__B__APP_FLIGHTS_39 1. /Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredserver1/PR-APP-FLIGHTS-_B__APP_FLIGHTS_39-1400537865896.csv . Number of rows extracted : 5 Table:APP_FLIGHTAVAILABILITY__B__APP_FLIGHTAVAILABILITY_28 1. /Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredserver2/PR-APP-FLIGHTAVAILABILITY-_B__APP_FLIGHTAVAILABILITY_28-1400537866855.csv . Number of rows extracted : 14 [...]
The DDL EXPORT INFORMATION shows the order in which the utility recommends replaying DDL files to restore the data dictionary. You can review the DDL files to ensure that the tables match your expected schema. Comments are inserted to call out replicated and partitioned tables, as well as table colocation.
This is followed by a list of CSV files that contain the data values to load into the tables. In the example above, you can see that FLIGHTS is a partitioned table, and a separate CSV file is generated per bucket of the table. The file summary shows the number of rows recovered for each bucket of the table.
Procedure for Loading Recovered Data into a New System
Follow these steps to load the SQL and CSV files that were recovered using dataextractor into a new GemFire XD system.
- Boot a new GemFire XD distributed system into which you will load the recovered data.
Ensure that you define the necessary server groups, heap configuration, and disk resources
needed to host the recovered data. Refer to the DDL EXPORT INFORMATION portion of the
Summary.txt file to determine which server groups are expected when
recreating the schema.If you are continuing with the example cluster recovered in Procedure for Recovering Data from Disk Stores, then a single datastore is sufficient to reload the sample data. Create and start the new datastore directly in the recovery subdirectory:
$ cd ~/recovery-directory $ mkdir recovery-server $ gfxd server start -dir=./recovery-server/ Starting GemFireXD Server using multicast for peer discovery: 184.108.40.206 Starting network server for GemFireXD Server at address localhost/127.0.0.1 Logs generated in /Users/yozie/recovery-directory/./recovery-server/gfxdserver.log GemFireXD Server pid: 4674 status: running
- Run the dataextractloader utility, specifying the
Recommended.txt file that was created during recovery and the
hostname and port number of a locator or server to use for connecting to the new GemFire
XD system. For
$ dataextractloader host=localhost port=1527 recommended=./EXTRACTED_FILES/Recommended.txt Loading .sql file: /Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredlocator1/exported_ddl.sql Executing :CALL SYSCS_UTIL.IMPORT_TABLE_EX ('APP', 'FLIGHTS', '/Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredserver1/PR-APP-FLIGHTS-_B__APP_FLIGHTS_36-1400537865621.csv' , ',', '"', null, 0, 0, 6, 0, null, null) Executing :CALL SYSCS_UTIL.IMPORT_TABLE_EX ('APP', 'FLIGHTS', '/Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredserver1/PR-APP-FLIGHTS-_B__APP_FLIGHTS_37-1400537866233.csv' , ',', '"', null, 0, 0, 6, 0, null, null) Executing :CALL SYSCS_UTIL.IMPORT_TABLE_EX ('APP', 'FLIGHTS', '/Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredserver2/PR-APP-FLIGHTS-_B__APP_FLIGHTS_34-1400537866329.csv' , ',', '"', null, 0, 0, 6, 0, null, null) Executing :CALL SYSCS_UTIL.IMPORT_TABLE_EX ('APP', 'FLIGHTS', '/Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredserver1/PR-APP-FLIGHTS-_B__APP_FLIGHTS_35-1400537865998.csv' , ',', '"', null, 0, 0, 6, 0, null, null) Executing :CALL SYSCS_UTIL.IMPORT_TABLE_EX ('APP', 'FLIGHTS', '/Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredserver1/PR-APP-FLIGHTS-_B__APP_FLIGHTS_38-1400537866224.csv' , ',', '"', null, 0, 0, 6, 0, null, null) [...]Note: See dataextractloader for a full description of additional command-line options.Note: Any errors that occur while loading data from the CSV files is recorded in the output log file, which is stored in EXTRACTED_LOADER/extractor.log. Errors do not prevent the loader from attempting to load further data.
- Connect to the distributed system and verify that the recovered data was
$ gfxd gfxd version 1.4.0 gfxd> connect client 'localhost:1527'; gfxd> show tables; TABLE_SCHEM |TABLE_NAME |REMARKS ------------------------------------------------------------------------ SYS |ASYNCEVENTLISTENERS | SYS |GATEWAYRECEIVERS | SYS |GATEWAYSENDERS | SYS |SYSALIASES | SYS |SYSCHECKS | SYS |SYSCOLPERMS | SYS |SYSCOLUMNS | SYS |SYSCONGLOMERATES | SYS |SYSCONSTRAINTS | SYS |SYSDEPENDS | SYS |SYSDISKSTORES | SYS |SYSFILES | SYS |SYSFOREIGNKEYS | SYS |SYSHDFSSTORES | SYS |SYSKEYS | SYS |SYSROLES | SYS |SYSROUTINEPERMS | SYS |SYSSCHEMAS | SYS |SYSSTATEMENTS | SYS |SYSSTATISTICS | SYS |SYSTABLEPERMS | SYS |SYSTABLES | SYS |SYSTRIGGERS | SYS |SYSVIEWS | SYSIBM |SYSDUMMY1 | APP |AIRLINES | APP |CITIES | APP |COUNTRIES | APP |FLIGHTAVAILABILITY | APP |FLIGHTS | APP |FLIGHTS_HISTORY | APP |MAPS | 32 rows selected
The above output shows that tables in the APP schema were recreated during the recovery process. Further queries against the tables show that the example data was also loaded.
Troubleshooting Data Recovery Errors
This section describes some common errors that can occur while recovering data or loading recovered data into a new system.
|Errors indicate that a disk store was not recovered from a directory.||A common error during data extraction indicates that a named disk store was not recovered from a specific directory. This generally does not indicate an error in the extraction process. In order to avoid problems caused by corrupt directory mappings in oplog files, the utility looks for all disk store files in all directories listed for a GemFire XD member. While this ensures that the tool recovers as much data as possible, it also results in this error when a disk store's files do not appear in a specified directory.|
|dataextractor fails to recover any data.||The persistent data dictionary must be available in order to recover any data from the disk store files. See Requirements.|
|Out of Memory Exceptions during data recovery.||
The dataextractor utility attempts to calculate the size of the target disk stores, and spawns multiple threads in order to extract data as fast as possible. The number of threads is determined by how much heap memory you provide to the utility. If you receive out of memory exceptions:
|Out of disk space errors.||
If you run out of disk space while executing dataextractor, the utility exits and all data that was recovered up to that point is available in the output directory. However, the Recommended.txt and Summary.txt files are not created. If this occurs, free the available disk space and then re-run the utility.
|Data not recovered for a server, "This oplog is a pre 7.0 version" error, or other failures.||A corrupted disk store metadata file (.if extension) can result in a failure to extract data for a member, or can manifest itself in other ways, such as by reporting the "pre 7.0 version" error. In this case, data may not be recoverable unless you can restore a viable .if file from backup.|
|Errors while loading recovered data.||As described in Limitations for Data Recovery, the disk store recovery process cannot guarantee data consistency. Errors that occur while loading recovered data are common. However, errors that occur while executing the dataextractloader do not prevent the utility from attempting to load additional data. See the dataextractloader log file for a complete record of errors that occurred.|