The dataextractor utility operates against a set of available GemFire XD operational log files (disk stores) in order to extract data to CSV files and provide recommendations for how to best restore the data in a new distributed system. A dataextractloader utility takes the CSV file output along with a recommendations file, and uses those inputs to load the recovered data into a new GemFire XD system.
|Extract log||extract.log||Full log output for the data extraction process.|
|File summary||Summary.txt||Specifies a complete list of all SQL and CSV file names that were created during the extraction process.|
|Recovery recommendations||Recommended.txt||Specifies the absolute path of all SQL and CSV files that the utility recommends using for loading into a GemFire XD system, in the order that they should be loaded. This represents the "best guess" selection of content that will recover the most data from the available files.|
|DDL files.||exported_ddl.sql||DDL files that can be replayed to create the recovered database
All DDL files are created in a subdirectory for each server being recovered.
|Recovered data files||PR-APP-FLIGHTS-_B__APP_FLIGHTS_91-1400537866176.csv
|These files contain the recovered data that you can use with dataextractloader to restore the data in a new distributed system.
CSV files are created per table. Partitioned table filenames begin with PR, and
replicated table files begin with RR. A partitioned table generates one CSV file
per bucket, and the bucket number is present in the filename. All CSV filenames
include a timestamp.
All CSV files are created in a subdirectory for each server being recovered.
After running dataextractor, you can choose to use the recommendations file as-is, or edit the file to recover only a portion of the available data. You then run dataextractloader with the recommendations file to load the data into a new GemFire XD system.
The dataextractor utility provides only a "best effort" attempt to recover disk store data. Keep these limitations in mind when you use the utilities
The following procedures and resources are required in order to use the data recovery utilities.
Follow these steps to extract available data from available GemFire XD disk store files.
$ gfxd shut-down-all -locators=localhost Connecting to distributed system: locators=localhost Successfully shut down 2 members $ gfxd locator stop -dir=$HOME/locator The GemFireXD Locator has stopped.
$ mkdir ~/recovery-directory $ cp -r ~/locator ~/recovery-directory $ cp -r ~/server1 ~/recovery-directory $ cp -r ~/server2 ~/recovery-directory
$ cd ~/recovery-directory $ touch extractor.properties
recoveredlocator1=/Users/yozie/recovery-directory/locator recoveredserver1=/Users/yozie/recovery-directory/server1 recoveredserver2=/Users/yozie/recovery-directory/server2
$ export JAVA_ARGS=-Xmx2G
$ dataextractor property-file=./extractor.properties Reading the properties file : ./extractor.properties Total size of data to be extracted : 14.4404296875MB Disk space available in the output directory : 30423.44921875MB Sufficient disk space to carry out data extraction Extracting DDL for server : recoveredserver1 Extracting DDL for server : recoveredlocator1 Extracting DDL for server : recoveredserver2 Completed extraction of DDL's for server : recoveredlocator1 Completed extraction of DDL's for server : recoveredserver1 Completed extraction of DDL's for server : recoveredserver2 NULL ROW FORMATTER FOR:SYSIBMSYSDUMMY1 Maximum disk-store size on disk 5.057651519775391 MB Available memory : 52.09442901611328 MB Estimated memory needed per server : 11.12683334350586 MB Recommended number of threads to extract server(s) in parallel : 4 Started data extraction for Server : recoveredlocator1 Started data extraction for Server : recoveredserver1 Started data extraction for Server : recoveredserver2 Extracting disk stores Extracting disk stores Server : recoveredlocator1 Attempting extraction of diskstore:GFXD-DEFAULT-DISKSTORE from directory: /Users/yozie/recovery-directory/locator Extracting disk stores Server : recoveredserver2 Attempting extraction of diskstore:GFXD-DEFAULT-DISKSTORE from directory: /Users/yozie/recovery-directory/server2 Server : recoveredserver1 Attempting extraction of diskstore:GFXD-DEFAULT-DISKSTORE from directory: /Users/yozie/recovery-directory/server1 Completed extraction of diskstore:GFXD-DEFAULT-DISKSTORE from directory: /Users/yozie/recovery-directory/locator Completed extraction of diskstore:GFXD-DEFAULT-DISKSTORE from directory: /Users/yozie/recovery-directory/server2 Completed extraction of diskstore:GFXD-DEFAULT-DISKSTORE from directory: /Users/yozie/recovery-directory/server1 Total Salvage Time : 15.851s Writing out Summary and Recommendation... Completed Summary and Recommendation
$ ls EXTRACTED_FILES/ datadictionary/ extractor.properties locator/ server1/ server2/
$ cat EXTRACTED_FILES/Summary.txt [DDL EXPORT INFORMATION] 1. recoveredlocator1 , file : /Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredlocator1/exported_ddl.sql Number of ddl statements : 9 2. recoveredserver1 , file : /Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredserver1/exported_ddl.sql Number of ddl statements : 9 3. recoveredserver2 , file : /Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredserver2/exported_ddl.sql Number of ddl statements : 9 [EXPORT INFORMATION FOR TABLES] Table:APP_FLIGHTS__B__APP_FLIGHTS_36 1. /Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredserver1/PR-APP-FLIGHTS-_B__APP_FLIGHTS_36-1400537865621.csv . Number of rows extracted : 3 Table:APP_FLIGHTS__B__APP_FLIGHTS_37 1. /Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredserver1/PR-APP-FLIGHTS-_B__APP_FLIGHTS_37-1400537866233.csv . Number of rows extracted : 5 Table:APP_FLIGHTS__B__APP_FLIGHTS_34 1. /Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredserver2/PR-APP-FLIGHTS-_B__APP_FLIGHTS_34-1400537866329.csv . Number of rows extracted : 5 Table:APP_FLIGHTS__B__APP_FLIGHTS_35 1. /Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredserver1/PR-APP-FLIGHTS-_B__APP_FLIGHTS_35-1400537865998.csv . Number of rows extracted : 5 Table:APP_FLIGHTS__B__APP_FLIGHTS_38 1. /Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredserver1/PR-APP-FLIGHTS-_B__APP_FLIGHTS_38-1400537866224.csv . Number of rows extracted : 11 Table:APP_FLIGHTS__B__APP_FLIGHTS_39 1. /Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredserver1/PR-APP-FLIGHTS-_B__APP_FLIGHTS_39-1400537865896.csv . Number of rows extracted : 5 Table:APP_FLIGHTAVAILABILITY__B__APP_FLIGHTAVAILABILITY_28 1. /Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredserver2/PR-APP-FLIGHTAVAILABILITY-_B__APP_FLIGHTAVAILABILITY_28-1400537866855.csv . Number of rows extracted : 14 [...]
The DDL EXPORT INFORMATION shows the order in which the utility recommends replaying DDL files to restore the data dictionary. You can review the DDL files to ensure that the tables match your expected schema. Comments are inserted to call out replicated and partitioned tables, as well as table colocation.
This is followed by a list of CSV files that contain the data values to load into the tables. In the example above, you can see that FLIGHTS is a partitioned table, and a separate CSV file is generated per bucket of the table. The file summary shows the number of rows recovered for each bucket of the table.
Follow these steps to load the SQL and CSV files that were recovered using dataextractor into a new GemFire XD system.
$ cd ~/recovery-directory $ mkdir recovery-server $ gfxd server start -dir=./recovery-server/ Starting GemFireXD Server using multicast for peer discovery: 126.96.36.199 Starting network server for GemFireXD Server at address localhost/127.0.0.1 Logs generated in /Users/yozie/recovery-directory/./recovery-server/gfxdserver.log GemFireXD Server pid: 4674 status: running
$ dataextractloader host=localhost port=1527 recommended=./EXTRACTED_FILES/Recommended.txt Loading .sql file: /Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredlocator1/exported_ddl.sql Executing :CALL SYSCS_UTIL.IMPORT_TABLE_EX ('APP', 'FLIGHTS', '/Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredserver1/PR-APP-FLIGHTS-_B__APP_FLIGHTS_36-1400537865621.csv' , ',', '"', null, 0, 0, 6, 0, null, null) Executing :CALL SYSCS_UTIL.IMPORT_TABLE_EX ('APP', 'FLIGHTS', '/Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredserver1/PR-APP-FLIGHTS-_B__APP_FLIGHTS_37-1400537866233.csv' , ',', '"', null, 0, 0, 6, 0, null, null) Executing :CALL SYSCS_UTIL.IMPORT_TABLE_EX ('APP', 'FLIGHTS', '/Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredserver2/PR-APP-FLIGHTS-_B__APP_FLIGHTS_34-1400537866329.csv' , ',', '"', null, 0, 0, 6, 0, null, null) Executing :CALL SYSCS_UTIL.IMPORT_TABLE_EX ('APP', 'FLIGHTS', '/Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredserver1/PR-APP-FLIGHTS-_B__APP_FLIGHTS_35-1400537865998.csv' , ',', '"', null, 0, 0, 6, 0, null, null) Executing :CALL SYSCS_UTIL.IMPORT_TABLE_EX ('APP', 'FLIGHTS', '/Users/yozie/recovery-directory/EXTRACTED_FILES/recoveredserver1/PR-APP-FLIGHTS-_B__APP_FLIGHTS_38-1400537866224.csv' , ',', '"', null, 0, 0, 6, 0, null, null) [...]
$ gfxd gfxd version 1.4.0 gfxd> connect client 'localhost:1527'; gfxd> show tables; TABLE_SCHEM |TABLE_NAME |REMARKS ------------------------------------------------------------------------ SYS |ASYNCEVENTLISTENERS | SYS |GATEWAYRECEIVERS | SYS |GATEWAYSENDERS | SYS |SYSALIASES | SYS |SYSCHECKS | SYS |SYSCOLPERMS | SYS |SYSCOLUMNS | SYS |SYSCONGLOMERATES | SYS |SYSCONSTRAINTS | SYS |SYSDEPENDS | SYS |SYSDISKSTORES | SYS |SYSFILES | SYS |SYSFOREIGNKEYS | SYS |SYSHDFSSTORES | SYS |SYSKEYS | SYS |SYSROLES | SYS |SYSROUTINEPERMS | SYS |SYSSCHEMAS | SYS |SYSSTATEMENTS | SYS |SYSSTATISTICS | SYS |SYSTABLEPERMS | SYS |SYSTABLES | SYS |SYSTRIGGERS | SYS |SYSVIEWS | SYSIBM |SYSDUMMY1 | APP |AIRLINES | APP |CITIES | APP |COUNTRIES | APP |FLIGHTAVAILABILITY | APP |FLIGHTS | APP |FLIGHTS_HISTORY | APP |MAPS | 32 rows selected
The above output shows that tables in the APP schema were recreated during the recovery process. Further queries against the tables show that the example data was also loaded.
This section describes some common errors that can occur while recovering data or loading recovered data into a new system.
|Errors indicate that a disk store was not recovered from a directory.||A common error during data extraction indicates that a named disk store was not recovered from a specific directory. This generally does not indicate an error in the extraction process. In order to avoid problems caused by corrupt directory mappings in oplog files, the utility looks for all disk store files in all directories listed for a GemFire XD member. While this ensures that the tool recovers as much data as possible, it also results in this error when a disk store's files do not appear in a specified directory.|
|dataextractor fails to recover any data.||The persistent data dictionary must be available in order to recover any data from the disk store files. See Requirements.|
|Out of Memory Exceptions during data recovery.||
The dataextractor utility attempts to calculate the size of the target disk stores, and spawns multiple threads in order to extract data as fast as possible. The number of threads is determined by how much heap memory you provide to the utility. If you receive out of memory exceptions:
|Out of disk space errors.||
If you run out of disk space while executing dataextractor, the utility exits and all data that was recovered up to that point is available in the output directory. However, the Recommended.txt and Summary.txt files are not created. If this occurs, free the available disk space and then re-run the utility.
|Data not recovered for a server, "This oplog is a pre 7.0 version" error, or other failures.||A corrupted disk store metadata file (.if extension) can result in a failure to extract data for a member, or can manifest itself in other ways, such as by reporting the "pre 7.0 version" error. In this case, data may not be recoverable unless you can restore a viable .if file from backup.|
|Errors while loading recovered data.||As described in Limitations for Data Recovery, the disk store recovery process cannot guarantee data consistency. Errors that occur while loading recovered data are common. However, errors that occur while executing the dataextractloader do not prevent the utility from attempting to load additional data. See the dataextractloader log file for a complete record of errors that occurred.|