Overview of HDFS Persistence Models

GemFire XD provides two models for persisting GemFire XD table data to HDFS: HDFS Write-Only and HDFS Read/Write.

HDFS Write-Only

With HDFS write-only persistence, all of the DML operations that are applied to a table are persisted to HDFS log files. No compaction is performed on the persisted data, so the HDFS logs maintain a complete record of the table's operations over time. You can directly process the persisted HDFS data using Hadoop tools such as MapReduce. Or, you can use the GemFire XD PXF driver (installed with HAWQ) to access the persisted Hadoop data using HAWQ. However, you cannot access the persisted HDFS data from GemFire XD clients using SQL commands. Only the in-memory data for the table remains available for query processing. For HDFS write-only tables that are not persistent, you can use the EVICTION BY CRITERIA Clause to configure the total amount of data maintained in memory.

HDFS write-only persistence enables applications to collect unbounded volumes of data in parallel for future processing. Common applications include:
  • Collecting sensor data that is emitted at a constant, high rate, using multiple GemFire XD data stores. HDFS write-only persistence is combined custom eviction criteria to address the memory constraints of individual GemFire XD members. Applications can only access and modify data that is currently in memory, or data that has been overflowed to local GemFire XD disk store operational log files. Only the HDFS store contains a history of the table operations that were evicted and destroyed.
  • Analyzing historical data over time. By streaming all of the operations applied to a table, you can use HAWQ or Hadoop tools to analyze historical information about data values. For example, you can determine how much a specific value increased or decreased over a range of time, using table data in the HDFS store.

In contrast to other forms of GemFire XD persistence configurations, the HDFS write-only model does not enable you to read the HDFS-persisted data back into a GemFire XD system for querying. This means that update and delete operations can only be applied to table data that is currently in memory or data stored in local GemFire XD disk stores.

The key considerations when using HDFS write-only persistence are:
  1. Managing the in-memory data footprint. Most streaming applications collect more data than can be accommodated in GemFire XD disk store files. See Managing In-Memory Data for HDFS Tables.
  2. Controlling when HDFS log files are closed. Log files are not available to Hadoop tools until GemFire XD closes the files. See How GemFire XD Manages HDFS Data.
  3. Accessing HDFS log file data. See Using MapReduce to Access HDFS Data and Using HAWQ to Access HDFS Data.

See also Configuring HDFS Write-Only Persistence for an example of the steps you take to configure an HDFS write-only table.

HDFS Read/Write

With HDFS read/write persistence, all of the DML operations that are applied to a table are persisted to HDFS log files. GemFire XD periodically compacts the log files into larger files in order to improve read performance. You can access table data in the HDFS log files using Hadoop tools, or using SQL queries from clients or peers in the GemFire XD distributed system. For HDFS read/write tables, you use the EVICTION BY CRITERIA Clause to define a working subset of table data to keep in-memory for high-performance queries. Data that does not fit into this "operational" set is evicted from GemFire XD memory, but remains in HDFS log files to satisfy queries and updates. Only the in-memory, operational data is indexed for high-performance querying.

HDFS read/write persistence enables you to manage very large tables directly in GemFire XD—beyond the scope of what is possible when overflowing to local GemFire XD disk stores. Overflow to GemFire XD disk stores evicts only non-primary key columns from memory; primary keys and indexes are always retained in memory, so tables can become large enough that this data cannot fit into available memory. HDFS read/write tables persist entire rows to operational log files.

The key considerations when using HDFS read/write are:
  1. Managing the in-memory data footprint. The operational data set that you define should support the queries used by your applications, while controlling the memory footprint of GemFire XD servers. See Managing In-Memory Data for HDFS Tables and Automatic Overflow for Operational Data.
  2. Managing compaction behavior for HDFS log files. Periodic compaction is necessary to improve read operations against HDFS data. See How GemFire XD Manages HDFS Data and Compaction for HDFS Log Files.
  3. Accessing table data from GemFire XD clients. Queries against HDFS read/write tables can either access just the operational (in-memory) table data, or they can access the complete table data (in-memory data and data stored in HDFS log files). By default, only the operational data is queries. DML statements always operate against the full data set, but GemFire XD provides additional DML statements to improve performance. See Querying and Updating HDFS Read/Write Tables.
  4. Accessing HDFS log file data outside of GemFire XD. See Using MapReduce to Access HDFS Data and Using HAWQ to Access HDFS Data.
Note: HDFS read/write tables that use eviction criteria cannot have foreign key constraints, because the table data needed to enforce such constraints would require scanning persisted HDFS data. To improve query performance for these tables, create indexes on the columns where you would normally assign foreign key constraints.

Prerequisites for Secure HDFS

GemFire XD supports the Pivotal HD Enterprise Hadoop implementation for persisting table data to HDFS. All GemFire XD members that persist data to HDFS must have permission to write to the configured HDFS store directory.

Authentication Requirements for Hadoop

If you have configured Hadoop with Kerberos for user authentication, then the following requirements apply to each node in the Hadoop cluster:
  1. The Kerberos service must be running on the HDFS node.
  2. The core-site.xml configuration file must include the hadoop.security.authentication=kerberos property definition.
  3. The hadoop-policy.xml configuration file must specify the name of the GemFire XD user in the security.client.protocol.acl list.
  4. The GemFire XD user must have read/write permission on the HomeDir directory of the GemFire XD HDFS store in Hadoop. One way to achieve this is to first authenticate to HDFS as a superuser and create the home directory for the GemFire XD user. Then, still as the superuser, change ownership of the GemFire XD home directory to the GemFire XD user.

See "Site XML Changes" in the Pivotal HD Enterprise Stack and Tool Reference Guide for more information.

Authentication Requirements for GemFire XD Members

All authentication and authorization for HDFS access is tied to the user that executes the GemFire XD member process. For example, when you create an HDFS store in GemFire XD, the default directory structure for the store is created under the /user/user-name directory in HDFS, where user-name is the process owner for GemFire XD. (You can override this default by specifying an absolute path beginning with the "/" character, or by setting an alternate HDFS root directory for servers using the hdfs-root-dir property.) The GemFire XD user must have read/write permission on the HDFS store directory on each Hadoop cluster node, as described in Authentication Requirements for Hadoop.

If the Hadoop cluster enables Kerberos for user authentication, then the following requirements apply to each GemFire XD member in the distributed system:
  1. Each GemFire XD node must have the principal and keytab file for the GemFire XD user. See Generating a Principal and keytab File. The principal that you define in the keytab file must be the same as the NameNode principal defined in the Hadoop hdfs-site.xml file.
  2. Any CREATE HDFSSTORE command that references a secure NameNode must include the ClientConfigFile option to point to an HDFS client configuration file that contains:
    • the NameNode principal
    • the GemFire XD user principal
    • the path to the GemFire XD principal keytab file.
      Note: As a best practice, create a separate keytab file for each client hostname, and specify a soft link to the correct keytab file in hdfs-site.xml. This enables you to use the same hdfs-site.xml on each GemFire XD node while still maintaining separate keytab files.
    The following listing shows the example contents of a hdfs-site.xml client configuration file:
    <configuration>
         <property>
                 <name>hadoop.security.authentication</name>
                 <value>kerberos</value>
         </property>
         <property>
                  <name>dfs.namenode.kerberos.principal</name>
                  <value>namenode-service-principal</value>
         </property>
         <property>
                  <name>gemfirexd.kerberos.principal</name>
                  <value>gemfire-xd-principal</value>
         </property>   
         <property>
                  <name>gemfirexd.kerberos.keytab.file</name>
                  <value>path-to-gemfire-xd-keytab-file-with-gemfire-xd-principal/value>
         </property>
    </configuration>
  3. Verify that you can access the secure Hadoop namenode by executing the following command on each GemFire XD member machine:
    hadoop fs -ls hdfs://name-node-address:name-node-port/

Generating a Principal and keytab File

All HDFS clients including GemFire XD members (datastores) must obtain a Kerberos Ticket Granting Ticket (TGT) in order to access Hadoop when Kerberos authentication is enabled. Begin by generating a principal and keytab file for GemFire XD:
  1. Create a GemFire XD principal using the kadmin utility command:
    kadmin:  addprinc -randkey gfxd-user/hostname@LOCAL.DOMAIN 
  2. Create the keytab file using the kadmin utility command:
    kadmin: xst -norandkey -k gfxd-user.keytab gfxd-user
  3. Change the ownership and privileges of the keytab file for the associated user:
    $ sudo chown gfxd-user gfxd-user.keytab
    $ sudo chmod 400 gfxd-user.keytab

On GemFire XD member machines, specify the keytab file in the gemfirexd.kerberos.keytab.file property in hdfs-site.xml, as described in Authentication Requirements for GemFire XD Members. As an alternative, you can use the kinit utility to authenticate to Kerberos before running GemFire XD. GemFire XD automatically uses the generated authentication token when a member books. Keep in mind, however, that tokens generated with kinit expire after 10 hours.

A GemFire XD member that fails to authenticate to a secure Hadoop installation receives the exception:
java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed will be thrown when accessing a secure HDFS.

Managing In-Memory Data for HDFS Tables

Tables that use either HDFS write-only or HDFS read/write persistence store some portion of the table's data in memory as well as in HDFS log files. You must carefully configure the in-memory footprint for HDFS tables to balance in-memory data requirements of your applications with the available memory in GemFire XD members.

EVICTION BY CRITERIA for HDFS Tables

With HDFS read/write tables, with HDFS write-only tables that do not configure local persistence to GemFire XD disk stores, you can use the EVICTION BY CRITERIA Clause with a SQL predicate to define the data that you want GemFire XD to remove from memory. The remaining data represents the table's operational data set, which is kept in memory for high-performance querying. Keep in mind that the eviction action never propagates the destroy action to values in the HDFS store. Eviction simply removes table values from the in-memory data set, while the values remain in the tables HDFS log files.

Note: HDFS Write-only tables that are configured with the PERSISTENT clause cannot use the EVICTION BY CRITERIA clause. HDFS read/write tables can use this clause regardless whether you configure them to use local persistence to GemFire XD disk stores.
The SQL predicate that you define in EVICTION BY CRITERIA can use table column values or functions to define data for eviction. For example, in the following clause rows becomes eligible for eviction when their column value exceeds a specified value:
EVICTION BY CRITERIA ( column-name > column-value )
EVICTION BY CRITERIA provides two optional clauses to specify when GemFire XD actually evicts eligible data. The EVICTION FREQUENCY clause defines a regular interval after which GemFire XD goes through the in-memory data and evicts eligible rows:
[ EVICTION FREQUENCY integer-constant { SECONDS | HOURS | DAYS } [ START GMT-timestamp-constant ] ]
The START clause shown above specifies a fixed time to begin evicting data. You can also specify the START clause in the ALTER TABLE statement to force eviction on an existing table.
Note: If you use the START clause, you must specify the timestamp using the GMT (Greenwich Mean Time) time zone.

In contrast, the EVICT INCOMING clause evicts data immediately when it is inserted or modified.

You can include only one of EVICTION FREQUENCY or EVICT INCOMING. Use EVICT INCOMING only when both of the following are true:
  • The EVICTION BY CRITERIA predicate uses only column values to define the data to evict, and
  • Your applications never need high-speed access to the evicted data (the data need never be in-memory).

If you need to keep the evicted rows in memory for some period of time for query processing, then use EVICTION FREQUENCY.

Note: Regardless of how you configure the operational data set, it is critical that you profile your application with representative data to ensure that operational data set does not consume excessive memory in GemFire XD members. Both the eviction criteria and the eviction frequency can help mitigate the memory required for an HDFS read/write table. However, there may still be cases when application usage patterns cause the memory footprint of operational data to expand beyond capacity. GemFire XD uses Automatic Overflow for Operational Data to ensure that spikes in operational data do not cause out of memory errors.
Note: HDFS read/write tables that use eviction criteria cannot have foreign key constraints, because the table data needed to enforce such constraints would require scanning persisted HDFS data. To improve query performance for these tables, create indexes on the columns where you would normally assign foreign key constraints.

EXPIRE Clause for HDFS Tables

In addition to setting custom eviction criteria with EVICT BY CRITERIA, you can use the EXPIRE clause to destroy in-memory table data after a configured amount of time. The EXPIRE clause works for HDFS-persistent tables as well as non-HDFS tables. However, keep in mind that the DESTROY action triggered by an EXPIRE clause is not propagated to data stored in HDFS; only the in-memory table data is destroyed. Also, the local DESTROY action does not affect dependent tables (for example, child rows with foreign key relationships). See EXPIRE Clause.

Automatic Overflow for Operational Data

Eviction criteria by themselves cannot guarantee that the amount of operational data will never exceed available memory on the system. For this reason, GemFire XD automatically enables overflow for HDFS tables that use the EVICTION BY CRITERIA clause. Automatic overflow for HDFS tables occurs when the in-memory data reaches a critical percentage of the available heap space. You configure the critical heap percentage using the SYS.SET_EVICTION_HEAP_PERCENTAGE or SYS.SET_EVICTION_HEAP_PERCENTAGE_SG procedures, or by specifying the -critical-heap-percentage option when you start a GemFire XD member with gfxd. Also ensure that you start GemFire XD members using the -heap-size option, which enables the JVM resource manager.

Automatic overflow works by evicting table values to a local GemFire XD disk store, until the heap size decreases below the configured threshold.

GemFire uses the first available disk store in this list to store overflow data for an HDFS table:
  1. If you have configured a HDFS read/write table to be persistent (persisted to a local GemFire XD log file), then GemFire XD uses persistence disk store for automatic overflow.
    Note: HDFS write-only tables cannot use the EVICTION BY CRITERIA clause if they are also configured to be persistent.
  2. If the HDFS queue was configured for persistence (using the QUEUEPERSISTENT clause in the CREATE HDFSSTORE statement), then GemFire uses that disk store for automatic overflow.
  3. If neither the table or HDFS queue are configured for persistence, then GemFire XD uses the default disk store for automatic overflow. The default disk store uses file names beginning with BACKUPSQLF-DEFAULT-DISKSTORE, and they are stored in the root directory of GemFire XD data stores that host the table.

Unlike eviction to HDFS (the EVICTION BY CRITERIA clause), data that is overflowed to local GemFire XD disk stores remains indexed for improved query performance. However, the query performance for overflow data is reduced compared to queries against in-memory data. See Evicting Table Data from Memory.

Combining HDFS Persistence with Local Disk Store Persistence

While data that stored in HDFS log files remains in HDFS until those files are deleted, GemFire XD does not persist the in-memory table data by default. If you restart a cluster having tables that do not persist the in-memory table data, the in-memory data in those tables is lost. You can optionally persist the in-memory data using local GemFire XD disk store files.

You configure local persistence for an HDFS read/write table's in-memory data using the PERSISTENT clause in the CREATE TABLE statement. This clause can be used with all types of GemFire XD tables, including those that do not persist data to HDFS. The PERSISTENT clause can be used alone, in which case the in-memory table data is persisted to the GemFire XD default disk store file. Or, you can specify a named disk store that you previously configured using the CREATE DISKSTORE file.

Note: HDFS write-only tables cannot use the EVICTION BY CRITERIA clause if they are also configured to be persistent.

In either case, when you configure an HDFS read/write table with the PERSISTENT clause, the disk store used for persistence is also used for automatic overflow of the table's in-memory data.

Configuring HDFS Write-Only Persistence

Persisting data to HDFS in write-only mode enables you to stream large volumes of data for processing with Hadoop tools or HAWQ.

To configure HDFS write-only for a partitioned table, you first create an HDFS store configuration with properties to define the Hadoop NameNode and directory in which to store the data. You can optionally specify properties to configure the queue that GemFire XD uses to stream table data or the rollover frequency for HDFS log files. After creating the HDFS store, use CREATE TABLE with the HDFSSTORE and WRITEONLY clauses to create the partitioned table and stream its data to HDFS.
  1. Use CREATE HDFSSTORE to define the Hadoop connection and queue that GemFire XD uses to stream table data to Hadoop. For example:
    CREATE HDFSSTORE streamingstore
      NameNode 'hdfs://gfxd1:8020'
      HomeDir 'stream-tables' 
      BatchSize 10
      QueuePersistent true
      MaxWriteOnlyFileSize 200;
    The above command creates a new HDFS store configuration in the specified Hadoop NameNode and directory. The GemFire XD queue used to write to this HDFS store is persisted to a local GemFire XD disk store. (The default GemFire XD disk store is used, because the above command does not specify a named disk store). Data in the queue is written to the HDFS store in batches as large as 10 megabytes. GemFire XD streams all data to an HDFS log file until the file size reaches 200 megabytes.
  2. Use CREATE TABLE to create a partitioned table that persists to the HDFS store you created, and specify the WRITEONLY clause to use HDFS the write-only model. For example:
    CREATE TABLE monitor1
      ( mon_id INT NOT NULL PRIMARY KEY,
        mon_date DATE NOT NULL,
        mon_data VARCHAR(2000) )
    PARTITION BY PRIMARY KEY
    EXPIRE ENTRY WITH TIMETOLIVE 300 ACTION DESTROY
    HDFSSTORE (streamingstore) WRITEONLY;
    In addition to streaming table operations to the HDFS store, this CREATE TABLE statement ensures that table entries are evicted from memory (destroyed) after 5 minutes.

Configuring HDFS Read/Write Persistence

Persisting table to data to HDFS in read/write mode provides the flexibility of managing very large tables in GemFire XD while supporting high-performance querying of operational, in-memory data.

To configure HDFS with read/write persistence for a partitioned table, you first create a HDFS store configuration. The HDFS store properties define the basic Hadoop connection the GemFire XD queue configuration, but can also define compaction behavior for the persisted files. After defining the read/write HDFS store, use CREATE TABLE to create the partitioned table that uses the store for persistence.
  1. Use CREATE HDFSSTORE to define the Hadoop connection and queue that GemFire XD uses to stream table data to Hadoop, specifying any custom compaction behavior for the persisted files. For example:
    CREATE HDFSSTORE readwritestore
      NameNode 'hdfs://gfxd1:8020'
      HomeDir 'gfxd-overflow-tables'
      BatchSize 10
      QueuePersistent true
      MinorCompact true
      MajorCompact true
      MaxInputFileSize 12
      MinInputFileCount 4
      MaxInputFileCount 8
      MinorCompactionThreads 3
      MajorCompactionThreads 3;
      
    The above command creates a new HDFS store configuration in the specified Hadoop NameNode and directory. The GemFire XD queue used to write to this HDFS store is persisted to a local GemFire XD disk store. The default GemFire XD disk store is used, because the above command does not specify a named disk store. Data in the queue is written to the HDFS store in batches as large as 10 megabytes.

    GemFire XD performs both minor and major compaction on the resulting HDFS store persistence files. Minor compaction is performed on files up to 12 MB in size after at least 4 files are created. The major compaction cycle uses a maximum of 3 threads.

  2. Use CREATE TABLE to create a partitioned table that persists to the HDFS store you created. Use EVICTION BY CRITERIA to define the operational data set, and omit the WRITEONLY clause to ensure that GemFire XD can read back the data it persists to HDFS, using full table scans. For example:
    CREATE TABLE bigtable1
      ( big_id INT NOT NULL,
        big_date DATE NOT NULL,
        big_data VARCHAR(2000) )
    PARTITION BY PRIMARY KEY
    EVICTION BY CRITERIA ( big_id < 300000 )
    EVICTION FREQUENCY 180 SECONDS
    HDFSSTORE (readwritestore);
    In the above table, the EVICTION BY CRITERIA clause uses the value of the big_id column to define the table's operational data.

    Keep in mind that the EVICTION BY CRITERIA clause defines the data that you do not want to keep in the operational data set. Table data that matches the criteria is evicted, while the remaining data is kept in memory for high-performance querying.

    Note: HDFS read/write tables that use eviction criteria cannot have foreign key constraints, because the table data needed to enforce such constraints would require scanning persisted HDFS data. To improve query performance for these tables, create indexes on the columns where you would normally assign foreign key constraints.

Querying and Updating HDFS Read/Write Tables

By default, queries operate only against the in-memory, operational data set for a table. Clients that are interested in the full table dataset (both HDFS data and in-memory data) can use the query-HDFS connection property, or the queryHDFS query hint that applies only to a specific query.

To specify the queryHDFS property in a single query, include the --GEMFIREXD-PROPERTIES comment after specifying the table name. The following two example queries show the correct usage of queryHDFS:
select * from table-name --GEMFIREXD-PROPERTIES queryHDFS=true \n ;
select * from table-name --GEMFIREXD-PROPERTIES queryHDFS=true \n where column-name=column-value;

Notice the placement of queryHDFS in the second example. The property must always be defined immediately after the table names in the SQL statement (and before any additional SQL predicates). Also notice the \n characters shown in the examples. Because query hints are specified in SQL comments starting with --GEMFIREXD-PROPERTIES, you must include \n or a newline character before continuing the SQL command, or the remaining characters are ignored. See also Overriding Optimizer Choices.

To set HDFS query behaviour for all SQL statements executed in a single connection, specify the query-HDFS connection property when you connect to the GemFire XD cluster. For example:
gfxd> connect client 'localhost:1527;query-HDFS=true';

Keep in mind that queries that access HDFS data do not use in-memory indexes. Any queries that access HDFS-persistent data should reference a primary key, for performance reasons. Non primary key-based queries require a full table scan, which can lead to poor performance when accessing HDFS data.

Most DML commands (update and delete operations) always operate against the full data set of the table, even if you do not specify the queryHDFS hint. You need not provide any query hint to ensure that a value is inserted, updated, or deleted.

Note: For HDFS Write-only tables, TRUNCATE TABLE removes only the in-memory data for the table, but leaves the HDFS log files available for later processing with MapReduce and HAWQ. This occurs regardless of the queryHDFS setting.

For HDFS Read-Write tables, TRUNCATE TABLE removes the in-memory data for the table and marks the table's HDFS log files for expiry. This means that GemFire XD queries, MapReduce jobs with CHECKPOINT mode enabled, and HAWQ external tables with CHECKPOINT mode enabled no longer return data for the truncated table. MapReduce jobs and HAWQ external tables that do not use CHECKPOINT mode will continue to return some table values until the log files expire.

GemFire XD provides a special PUT INTO command to speed update operations on HDFS read/write tables. PUT INTO uses a syntax similar to the INSERT statement:
PUT INTO table-name
{ VALUES (...) |
  SELECT ...
}

PUT INTO differs from standard INSERT or UPDATE statements in that GemFire XD does not check existing primary key values before executing the command. If a row with the same key exists in the table, PUT INTO simply overwrites the older row value. If no rows with the same primary key exist, PUT INTO operates like a standard INSERT. Removing the primary key check speeds execution when updating large numbers of rows in HDFS stores.

How GemFire XD Manages HDFS Data

When you configure a table for HDFS persistence, GemFire XD places all data to be persisted to HDFS in an asynchronous queue. Batches of updates from this queue are periodically written to HDFS operational log files. The behavior of the HDFS queue and the number and format of HDFS log files differ depending on whether you configure a table for HDFS write-only or HDFS read/write persistence.

HDFS Store Directory Structure

GemFire XD creates a Hadoop subdirectory hierarchy for each HDFS store that is configured in the system. The root directory of the hierarchy is specified by the hdfs-root-dir property. By default this directory is /user/user_name, where user-name is the user that started GemFire XD JVM (for example, /user/gfxd).

Beneath the root directory, each HDFS store specifies a unique directory in which to store HDFS log files for tables that use the store. This is the HomeDir option provided in the CREATE HDFSSTORE statement.

In the HDFS store home directory, a separate directory is created for each table that uses the HDFS store, and each bucket in the table gets a further subdirectory.

HDFS Metadata Files

In addition to persisting table data to HDFS in operational log files, GemFire XD persists the table schema (the DML used to create the table) to HDFS in metadata files. These metadata files are used to recreate the table for MapReduce jobs or HAWQ operations that access the persisted data outside of a GemFire XD distributed system. Metadata files are stored in the .hopmeta subdirectory of the HDFS store, and have the .ddlhop extension.

HDFS Log Files

Each GemFire XD member that stores primary buckets for a partitioned table's data maintains an asynchronous queue of events for those buckets. As GemFire XD flushes the contents of the asynchronous queues, it creates new HDFS persistence files for those buckets. Each bucket has a dedicated subdirectory under the table directory where it stores all operational logs for that bucket's values. (Hadoop may also split HDFS persistence files as necessary to accommodate the configured HDFS block size.)

An operational log file uses the following format: bucketNumber-timeStamp-sequenceNumber.extension. The most common types of log file have the following extensions:
  • .hop is the extension used for files that store the raw, uncompacted table data that is flushed from an HDFS queue. Multiple .hop files are generated as necessary to store new data. New .hop files that are created for the same bucket increment the sequence-number in the filename. MapReduce jobs that process events over a period of time (event-mode jobs) access multiple .hop files in order to collect the required data.
  • .exp is a zero-length marker file that indicates when a .hop file expired.
  • .shop stores table data for a write-only HDFS table.
  • .ihop is an intermediate log file created by GemFire XD minor compaction. See Compaction for HDFS Log Files.
  • .chop is a snapshot log file created by GemFire XD major compaction. See Compaction for HDFS Log Files.

All HDFS persistence files are identified by a timestamp, which indicates that the file stores table operations that occurred earlier than the timestamp. File timestamps can be used in MapReduce jobs to process records for a specified interval, or for a particular point in time (a snapshot or checkpoint of the table data).

For tables that use HDFS write-only persistence, the HDFS queue on a primary member does not sort the events for the buckets it manages. Instead, the queue persists the unsorted operations to HDFS, and multiple batches are persisted to the same file in HDFS. Each file may include multiple entries for the same row, as updates and delete operations are appended. The same row value may appear in multiple persistence files, because GemFire XD does not automatically compact the files' contents.

For tables that use HDFS read/write persistence, the HDFS queue buffers and sorts all operations that occur for a given bucket, before persisting queued events to HDFS. Each ordered batch of events creates a new file in the HDFS store. However, GemFire XD performs periodic compaction to merge smaller files into larger files to improve read performance. MapReduce jobs can use the timestamps of individual persistence to perform incremental processing of values over a period of time. Also, because the events are ordered in the persistence files, individual values in a file can be accessed efficiently using primary key queries from GemFire XD clients.

Compaction for HDFS Log Files

Although HDFS read/write persistence initially creates multiple small persistence files in HDFS (one file for each batch of operations that is flushed from the HDFS queue), the primary GemFire XD member periodically compacts those persistence files into larger files. This compaction behavior helps to improve read performance by reducing the number of available files, and by removing older, intermediate row values from the persisted files. GemFire XD performs both minor and major compaction cycles:
  • Minor compaction combines multiple files into one larger file by discarding older row values. Reducing the number of files in HDFS helps to to avoid performance degradation in HDFS and the GemFire XD distributed system. You configure minor compaction by setting the minimum and maximum number of files, per table bucket, that are eligible for compaction, as well as the maximum size of file to consider when performing minor compaction. You can also configure the number of threads used to perform minor compaction.
    Note: Use caution when increasing the MinInputFileCount or MaxInputFileSize values, as they apply to each bucket persisted by the HDFS store, rather than to the HDFS store as a whole. As more tables target the HDFS store, additional HDFS file handles are required to manage the number of open files. A large number of buckets combined with a high MinInputFileCount can cause too many files to be opened in HDFS, resulting in exceptions if open file resources are exhausted. Ensure that you have configured your operating system to support large numbers of file descriptors (see Supported Configurations and System Requirements.)

    After minor compaction completes, GemFire XD marks the smaller persistence files for deletion. GemFire XD no longer uses the files for accessing table data for client queries. However, you can still use MapReduce jobs to process incremental data in those files before they are deleted (after 12 hours).

    Note: Do not disable minor compaction unless you tune other HDFS parameters to avoid severe performance degradation. Turning off minor compaction can cause a very large number of HDFS log files to be created, which can potentially exhaust HDFS receiver threads and/or client sockets. To offset these problems, increase the BatchSize option to create a fewer number of HDFS log files. As a best practice, leave minor compaction enabled unless compaction causes excessive I/O overhead in HDFS that cannot be resolved by tuning compaction behavior.
  • Major compaction combines all of the row values for a given bucket into a single persistence file. After major compaction is performed only the latest row values are stored in the persistence files, and the file represents a snapshot of the bucket values. You can configure major compaction by specifying the interval of time after which GemFire XD performs automatic major compaction, as well as the number of threads used to perform compaction. You can also manually initiate major compaction using the SYS.HDFS_FORCE_COMPACTION procedure.

GemFire XD performs automatic compaction when you enable compaction in CREATE HDFSSTORE, and only for tables that perform HDFS read/write persistence.

Tables that are configured for HDFS write-only persistence do not read data back from HDFS, so GemFire XD does not compact the log files to improve read performance. Instead, all intermediate row values remain available in the raw HDFS log files. MapReduce jobs and HAWQ can specify which log files to use in order to process time series data.

Understanding When Data is Available in HDFS

When you insert or update data in a GemFire XD HDFS table, your changes are not immediately available in HDFS log files. Several factors determine the length of time required before table DML operations are available in HDFS log files for processing by Hadoop tools and HAWQ.

The first factor that affects the delay in writing DML to HDFS files is the HDFS store queue configuration (CREATE HDFSSTORE command). The BatchSize argument determines how many DML operations remain in the queue's memory before being written to an HDFS log file. Using a smaller batch size shortens the time before operations are available in HDFS log files. However, keep in mind that a larger batch size enables GemFire XD to write fewer files to HDFS, which increases performance.

You can also use the SYS.HDFS_FLUSH_QUEUE procedure to flush the contents of a table's HDFS queue.

Even after DML events are written to a raw HDFS log file, the log file is not available for processing by Hadoop tools or HAWQ until GemFire XD closes the log file. For HDFS write-only tables, the MaxWriteOnlyFileSize property determines how frequently GemFire XD closes a given log file (making it available for MapReduce and HAWQ) and begins writing to a new file.

Although HDFS read/write tables provide raw HDFS log file data for time series processing, by default MapReduce jobs and HAWQ external tables work only with snapshot or checkpoint HDFS log files. These files provide only the current row values for a table at a given point in time. When working with checkpoint data, certain DML operations may not be available in HDFS log files until GemFire XD completes a major compaction cycle, which introduces a new snapshot of bucket values. The frequency with which DML operations appear in snapshot or checkpoint mode is determined by the MajorCompactionThreads property. Again, keep in mind that performing more frequent compactions can negatively affect performance. You will need to balance your performance requirements with the needs of other applications that process the persisted data.

You can also use the SYS.HDFS_FORCE_COMPACTION procedure to initiate a major compaction cycle.

Note: You can choose to work with raw HDFS log file data instead of checkpoint files by setting the CHECKPOINT property in the MapReduce RowInputFormat configuration or in the HAWQ external table definition. See Using MapReduce to Access HDFS Data and Using HAWQ to Access HDFS Data.

Using MapReduce to Access HDFS Data

GemFire XD stores all table data in HDFS in a native format, indexing the data as necessary for read/write access from GemFire XD clients. However, GemFire XD also extends the Hadoop InputFormat class to enable MapReduce jobs to access data in HDFS log files without having to start up or connect to a GemFire XD distributed system.

Note: GemFire XD provides a RowInputFormat and RowOutputFormat implementation for the older "mapred" API, as well as the new MapReduce API. You should consistently use the implementation that corresponds to the specific API version you are using for development.
Note: This section contains excerpts from the GemFire XD MapReduce sample applications. The full source code for the examples is installed in the /usr/lib/gphd/gfxd/examples directory.

Configuring Input to MapReduce Jobs

The GemFire XD RowInputFormat implementation configures the input data set and converts the raw data stored in HDFS operational logs into table rows for consumption in your custom MapReduce jobs. RowInputFormat properties configure input parameters such as the name of the table to use as input, the location of the tables HDFS log files, and the selection of log files to use for input. The following table describes each configuration property:

Table 1. RowInputFormat Configuration Properties
Property Compatible GemFire XD Tables Description
INPUT_TABLE Write-Only, Read/Write The name of the GemFire XD table in the format: schema_name.table_name
HOME_DIR Write-Only, Read/Write The name of the HDFS directory that stores the table's log files. This corresponds to the HomeDir value used in the CREATE HDFSSTORE statement.
START_TIME_MILLIS Write-Only, Read/Write Identifies the earliest timestamp for table events to process in the MapReduce job. You can specify this property by itself to process all events that occurred after a specific time, or you can combine it with END_TIME_MILLIS to process a range of events.

If you also configure the CHECKPOINT property, then the START_TIME_MILLIS value is ignored.

END_TIME_MILLIS Write-Only, Read/Write Identifies the latest timestamp for table events to process in the MapReduce job. You can specify this property by itself to process all events that occurred before a specific time, or you can combine it with START_TIME_MILLIS to process a range of events.

If you also configure the CHECKPOINT property, then the END_TIME_MILLIS value is ignored.

CHECKPOINT Read/Write This property is only available for HDFS read/write tables. By default, MapReduce jobs only process the latest table row values, instead of table events that occurred over time. Set the CHECKPOINT property to false in order to process incremental table values persisted in the HDFS log files.

For HDFS write-only and HDFS read/write tables, the START_TIME_MILLIS and/or END_TIME_MILLIS properties specify a range of table events to process in the MapReduce job. When you specify a time range for table events using these properties, the input formatter uses only raw .hop files for input data. Omitting both properties (and disabling CHECKPOINT) uses all available .hop files.

For HDFS read/write tables, you can optionally set the CHECKPOINT property to false in order to process the incremental table values stored in raw HDFS .hop files. By default, MapReduce jobs only use only compacted HDFS log files (.chop files) for the input data set. The .chop files are created during GemFire XD major compaction cycles, and contain only the latest row values for the table. The CHECKPOINT property is not applicable to HDFS write-only tables, because GemFire XD does not compact streamed HDFS log files. See How GemFire XD Manages HDFS Data for more information about HDFS log file types and compaction behavior.

You configure all properties for the input data set using a standard Hadoop JobConf object. For example, this code configures the input formatter to provide all raw .hop file data for the input table and HDFS home directory provided in the program's arguments (omitting both START_TIME_MILLIS and END_TIME_MILLIS returns all active .hop files):

    JobConf conf = new JobConf(getConf());
    conf.setJobName("Busy Airport Count");

    Path outputPath = new Path(args[0]);
    Path intermediateOutputPath = new Path(args[0] + "_int");
    String hdfsHomeDir = args[1];
    String tableName = args[2];

    outputPath.getFileSystem(conf).delete(outputPath, true);
    intermediateOutputPath.getFileSystem(conf).delete(intermediateOutputPath, true);

    conf.set(RowInputFormat.HOME_DIR, hdfsHomeDir);
    conf.set(RowInputFormat.INPUT_TABLE, tableName);
    conf.setBoolean(RowInputFormat.CHECKPOINT, false);

    conf.setInputFormat(RowInputFormat.class);
    conf.setMapperClass(SampleMapper.class);

Working With Table Data

Based on the configuration properties that you set, the GemFire XD RowInputFormat implementation identifies the .hop or .shop files that contain the necessary data, and creates splits over the HDFS blocks that store those files. Keep in mind that DML operations to GemFire XD tables are not immediately available in HDFS log files; see Understanding When Data is Available in HDFS for more information.

The implementation then provides table records to the map method in your Mapper implementation using key/value pairs. The KEYIN object that is provided to your Mapper implementation contains no relevant table data; it is provided only to conform to the standard Hadoop interface. All table data is encapsulated in a GemFire XD Row object. You either work with a single row value, or you can obtain ResultSet of multiple rows. This example Mapper implementation uses the row to obtain a result set:

  public static class SampleMapper extends MapReduceBase
      implements Mapper<Object, Row, Text, IntWritable> {

    private final static IntWritable countOne = new IntWritable(1);
    private final Text reusableText = new Text();

    @Override
    public void map(Object key, Row row,
        OutputCollector<Text, IntWritable> output,
        Reporter reporter) throws IOException {

      String origAirport;
      String destAirport;

      try {
        ResultSet rs = row.getRowAsResultSet();
        origAirport = rs.getString("ORIG_AIRPORT");
        destAirport = rs.getString("DEST_AIRPORT");
        reusableText.set(origAirport);
        output.collect(reusableText, countOne);
        reusableText.set(destAirport);
        output.collect(reusableText, countOne);
      } catch (SQLException e) {
        e.printStackTrace();
      }
    }
  }

As with standard MapReduce jobs, the Map method can evaluate each table record against some condition, or accumulate values as in the above example. Hadoop sorts the mapper output before sending to reducer, which reduces the set of intermediate values and uses an OutputFormatter to persist values.

Sending MapReduce Output to GemFire XD Tables

Whereas standard MapReduce jobs typically write their output to a file system, GemFire XD provides an OutputFormatter implementation that simplifies the task of writing data back into a GemFire XD table. Writing to a GemFire XD table requires that you have a distributed system running, with the table already defined in the data dictionary.

As with the RowInputFormat implementation, you use the RowOutputFormat implementation to configure properties in a standard Hadoop JobConf object. The following table describes each configuration property:

Table 2. RowOutputFormat Configuration Properties
Property Description
OUTPUT_TABLE The name of the GemFire XD table to write to, in the format: schema_name.table_name
OUTPUT_URL The JDBC thin client driver string used to connect to the GemFire XD distributed system (for example, jdbc:gemfirexd://hostname:port/). See Connect to a GemFire XD Server with the Thin Client JDBC Driver.
For example, the following sample code assumes that a GemFire XD locator is running locally, so it configured the JDBC connection string using the local host name. The output formatter is configured to write records to the APP.BUSY_AIRPORT table:
      JobConf topConf = new JobConf(getConf());
      topConf.setJobName("Top Busy Airport");

      String hdfsFS = topConf.get("fs.defaultFS");
      URI hdfsUri = URI.create(hdfsFS);
      hdfsUri.getHost();

      topConf.set(RowOutputFormat.OUTPUT_URL, "jdbc:gemfirexd://" + hdfsUri.getHost() + ":1527");
      topConf.set(RowOutputFormat.OUTPUT_TABLE, "APP.BUSY_AIRPORT");
...
      topConf.setReducerClass(TopBusyAirportReducer.class);
      topConf.setOutputKeyClass(Key.class);
      topConf.setOutputValueClass(BusyAirportModel.class);
      topConf.setOutputFormat(RowOutputFormat.class);

When you use the output formatter in your MapReduce job, all table writes insert data to the configured table data using the PUT INTO statement. PUT INTO uses a syntax similar to the INSERT statement, but GemFire XD does not check existing primary key values before executing the PUT INTO command. If a row with the same key exists in the table, PUT INTO simply overwrites the older row value.

Note: A PUT INTO operation does not invoke triggers, and you cannot use PUT INTO in the triggered SQL statement of a GemFire XD trigger.

As with standard Hadoop MapReduce jobs, the output is written using key/value pairs. As with the GemFire XD RowInputFormat implementation, the Key contains no relevant table information, but is only provided for compatibility with the standard Hadoop interface. All table data should be placed in the output value.

The following example writes key/value pairs using the RowOutputFormat configuration shown above. A new Key value is created just before the write, but contains no data. The output value is an instance of BusyAirportModel, which serializes itself as a JDBC row:
  public static class TopBusyAirportReducer extends MapReduceBase
      implements Reducer<Text, StringIntPair, Key, BusyAirportModel> {

    @Override
    public void reduce(Text token, Iterator<StringIntPair> values,
        OutputCollector<Key, BusyAirportModel> output, Reporter reporter)
        throws IOException {
      String topAirport = null;
      int max = 0;

      while (values.hasNext()) {
        StringIntPair v = values.next();
        if (v.getSecond() > max) {
          max = v.getSecond();
          topAirport = v.getFirst();
        }
      }
      BusyAirportModel busy = new BusyAirportModel(topAirport, max);
      output.collect(new Key(), busy);
    }
  }

Using HAWQ to Access HDFS Data

GemFire XD stores all table data in HDFS in a native format, indexing data as necessary to easily access read/write data from GemFire XD clients. In addition to the input/output system used to support standard Hadoop tools, GemFire XD supports a PXF Driver (installed with HAWQ) to enable Pivotal HAWQ to read HDFS table data. To access GemFire XD HDFS table data, you configure the table in HAWQ using the CREATE EXTERNAL TABLE command.

Prerequisites

In order to access GemFire XD HDFS table data, HAWQ must have access to the gemfirexd.jar library. If you installed GemFire XD on the same node as HAWQ and Pivotal HD, the gemfirexd.jar file should appear in the HADOOP_CLASSPATH environment variable.

If GemFire XD is not installed on the local HAWQ machine, copy the gemfirexd.jar file and add it to HADOOP_CLASSPATH. The file is installed in the /lib subdirectory of the GemFire XD installation directory:
  • For RPM installations, the file is installed to /opt/lib/gpdb/gfxd/lib/gemfirexd.jar.
  • For ZIP file installations, the file is installed to Pivotal_GemFireXD_10_bNNNN/lib/gemfirexd.jar in the directory where you unzipped the file.

See also "Installing PXF" in the Pivotal Extension Framework Installation and Administrator Guide for general requirements when using a PXF plug-in. For example, you must ensure that the Hadoop NameNode and all data nodes that store the HDFS table data use the REST service.

The PXF Driver does not support mapping GemFire XD columns of CHAR data type (PXF instead supports the BPCHAR type, which is incompatible with GemFire XD). For any tables that you intend to access using HAWQ, use columns of VARCHAR data type instead of CHAR.

Mapping HDFS Tables in HAWQ

To access and HDFS-persistent table with HAWQ, use the following CREATE EXTERNAL TABLE syntax from a HAWQ interactive command prompt such as psql:
CREATE EXTERNAL TABLE hawq_tablename ( column_list )
   LOCATION ('pxf://namenode_rest_host:port/hdfsstore_homedir/schema.table?PROFILE=GemFireXD[&attribute=value]* ')  
   FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
In this statement:
  • hawq_tablename is the name that you assign to the external table mapping. You will use this table name in HAWQ to query the table data.
  • column_list is the list of the column names and data types that you want to map from the GemFire XD table into HAWQ. Any columns that you specify must use the same column name that you specified in the GemFire XD CREATE TABLE statement, and a data type that is compatible with the PXF Driver. The PXF Driver supports only the datatype mappings described in this table:
    Table 3. PXF Driver Data Type Compatibility
    GemFire XD Data Type Matching PXF Driver Data Type
    SMALLINT SMALLINT
    INTEGER INTEGER
    BIGINT BIGINT
    REAL REAL
    DOUBLE FLOAT8
    VARCHAR VARCHAR or BPCHAR or TEXT
    BOOLEAN BOOLEAN
    NUMERIC NUMERIC
    TIMESTAMP TIMESTAMP
    BINARY BYTEA
    Note: You cannot map a GemFire XD table column that uses any other data type, including the CHAR data type. VARCHAR columns are supported.
    Note: The PXF external table definition does not support column constraint definitions such as NOT NULL or PRIMARY KEY. Do not include these constraints in the CREATE EXTERNAL TABLE command even if the original GemFire XD table specifies such constraints. Specify only the GemFire XD column name and PXF driver data type for each column definition.

    You can choose to map only a subset of the columns available in the GemFire XD table. As a best practice, map only the columns that you intend to use, rather than selecting a subset of columns in the HAWQ external table. The list of columns in the table mapping determines the amount of data transmitted to HAWQ during a query, regardless of the subset of columns you specify in the HAWQ query.

    In the column list, you can add the special column name GFXD_PXF_TS with the timestamp type, to include the GemFire XD timestamps that are associated with each table entry.

  • namenode_rest_host and port specify the host and port number of the REST service running on the Hadoop NameNode that stores the HDFS table log files. The default REST port is 50070.
  • hdfsstore_homedir is the home directory of the GemFire XD HDFS store that holds the table's log files. This corresponds to the HomeDir value that you specified in the CREATE HDFSSTORE statement.
  • schema.table is the full schema and table name of the table as it exists in GemFire XD (the table name that you supplied in the CREATE TABLE statement).
  • [&attribute=value]* represents one or more optional attribute definitions used to configure which HDFS log files the PXF Driver accesses for queries against the external table. See Table 4. By default, HAWQ processes either the full set of raw HDFS log files for an HDFS write-only table, or processes only the most recent snapshot of table values for an HDFS read/write table.

Optional PXF Driver Attributes

The PXF:// URL that you specify can also include the following, optional attributes. Note that if you do not include any options, the default PXF table mapping is equivalent to setting &CHECKPOINT=true for HDFS read/write tables.

Table 4. PXF Configuration Attributes
Attribute Compatible GemFire XD Tables Description
STARTTIME=yyyy.mm.dd-hh:mm:ss | yyyy.mm.dd-hh:mm:ssz Write-Only, Read/Write Identifies the earliest timestamp for table events to process in the HAWQ table. You can specify this attribute by itself to process all events that occurred after a specific time, or you can combine it with ENDTIME to process a range of events.

You can specify the timestamp in yyyy.mm.dd-hh:mm:ss or yyyy.mm.dd-hh:mm:ssz format (year.month.day-hour:minutes:seconds). The optional z specifies the timezone code to use. Greenwich Mean Time (GMT) is used by default if you do not specify a timezone.

If you also configure the CHECKPOINT attribute, then the STARTTIME value is ignored.

ENDTIME=yyyy.mm.dd-hh:mm:ss | yyyy.mm.dd-hh:mm:ssz Write-Only, Read/Write Identifies the latest timestamp for table events to process in the HAWQ table. You can specify this attribute by itself to process all events that occurred before a specific time, or you can combine it with STARTTIME to process a range of events.

ENDTIME uses the same timestamp format as the STARTTIME attribute described above.

If you also configure the CHECKPOINT attribute, then the ENDTIME value is ignored.

CHECKPOINT=[ true | false ] Read/Write This property is only available for HDFS read/write tables. When you specify the CHECKPOINT=true property, the HAWQ table only processes the latest table row values, instead of table events that occurred over time.

CHECKPOINT=true is used by default if you specify no other attributes. Always specify CHECKPOINT=false if you want to process table event data in the external table.

Keep in mind that DML operations to GemFire XD tables are not immediately available in HDFS log files, and are therefore not immediately visible in HAWQ external tables. See Understanding When Data is Available in HDFS for more information.

Example Table Mappings

Consider the folling example HDFS table, created in GemFire XD using the CREATE HDFSSTORE and CREATE TABLE commands:
gfxd> CREATE HDFSSTORE mystreamingstore  NAMENODE 'hdfs://pivhdsne:8020'  HOMEDIR 'streamstore';

gfxd> CREATE TABLE flightstream (mon_id INT NOT NULL PRIMARY KEY, mon_time TIMESTAMP NOT NULL, mon_data VARCHAR(2000))
PARTITION BY COLUMN (id) HDFSSTORE (mystreamingstore);
The following statement, executed in the HAWQ psql interactive prompt, creates an external table that uses all raw HDFS log files that were created after the specified STARTTIME. Note that it maps the special GFXD_PXF_TS column to include timestamp information for each table entry; GFXD_PXF_TS is not a column available in the original GemFire XD table. In GemFire XD, the table was created in the default APP schema, which is specified in the qualified table name after the HDFS store home directory::
gpadmin=# CREATE EXTERNAL TABLE hawq_flightstream ( mon_id INT,
          mon_time TIMESTAMP,
          mon_data VARCHAR(2000)
          GFXD_PXF_TS timestamp )
          LOCATION ('pxf://pivhdsne:50070/streamstore/app.flightstream?PROFILE=GemFireXD&STARTTIME=2013.11.01-12:00:00&CHECKPOINT=false')  
          FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
The following statement creates an external table that uses only the latest, compacted HDFS log files for an HDFS read/write table:
gpadmin=# CREATE EXTERNAL TABLE hawq_flightstream_current ( big_id INT,
          big_time TIMESTAMP,
          big_data VARCHAR(2000) )
          LOCATION ('pxf://pivhdsne:50070/streamstore/app.flightstream?PROFILE=GemFireXD&CHECKPOINT=true')  
          FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');

Backing Up and Restoring HDFS Tables

Persistent HDFS tables use both local operational log files and HDFS log files to store data. To back-up an HDFS table in its entirety, you must use separate commands to backup both operational and HFDS log files.

The gfxd backup command included with GemFire XD backs up only the local, operational log files that are created on GemFire XD members for persistent tables. Pivotal recommends using an HDFS tool such as distcp to backup the HDFS log files. Follow the steps below to ensure that you create a complete, valid backup of an HDFS table:
  1. Ensure that the HDFS table is created using the PERSISTENT keyword. Non-persistent tables do not store their operational data in local log files, and cannot be backed up.
  2. Do not perform any DDL operations while performing the remaining steps.
  3. Do not perform any DML operations while performing the remaining steps.
    Note: Because you backup an HDFS table in two parts, DML operations that occur during the backup can cause HDFS log file backups to have more recent table data than the operational log file backups. This can lead to table inconsistencies when you restore the log files from backup.
  4. Use gfxd backup to back up the table's operational log files. See for more information.
  5. Use distcp or another utility to backup all of the table's operational log files.
  6. Make note of the time and location of both sets of backups. You will need to use the same pair of log backups to restore the table at a later time.

Restoring an HDFS Table from Backup

In order to restore an HDFS table from backup, you must use the operational log file backup and HDFS log file backup that were created during the same backup procedure. Never restore HDFS and operational log backups that were created during different periods of time. If the operational data contains a table schema that is different from that of the HDFS data, then tables can become unreadable. If the schema in both backups is the same but the data comes from different periods of time, then inconsistent results can be returned when using the query-HDFS property to query operational versus HDFS data.

Follow these steps to restore an HDFS table:
  1. Shut down all members of the GemFire XD distributed system.
  2. Restore the HDFS log files to the same location, overwriting any existing files for the table. Refer to the documentation for your Hadoop backup utility for more information.
  3. Restore the local operational log files on GemFire XD members. See Backing Up and Restoring Disk Stores for more information.
  4. Restart the GemFire XD distributed system.

Best Practices for Achieving High Performance

When using GemFire XD HDFS-persistent tables, follow these recommendations to ensure to ensure the best possible performance of your system.

  • Colocate GemFire XD servers with HDFS data nodes. GemFire XD uses an asynchronous event queue to flush data from GemFire XD tables to HDFS. Colocating GemFire XD data stores with HDFS data nodes minimizes the network load and also enables certain optimizations when reading data from HDFS (for HDFS read/write tables).
  • Start GemFire XD servers with lock-memory=true to lock JVM heap and offheap pages into RAM. Locking memory prevents severe performance degradations caused HDFS disk buffering. Drop operating system caches before starting members with locked memory. If you deploy more than one server per host, begin by starting each server sequentially with lock-memory=true. Starting servers sequentially avoids a race condition in the operating system that can cause failures (even machine crashes) if you accidentally over-allocate the available RAM. After you verify that the system configuration is stable, you can then start servers concurrently.
  • Tune HDFS queues to avoid overflowing to disk. Each GemFire XD server JVM uses a single dispatcher per asynchronous HDFS queue to process events. Applications with high throughput can overwhelm the queue, causing it to overflow events to disk and degrade performance. To avoid queue overflow:
    • Configure a maximum queue size (MaxQueueMemory option) that is high enough to handle your application load. See CREATE HDFSSTORE.
    • If the capabilities of a single queue are insufficient, consider adding a second GemFire XD servers per host, which adds a new queue and dispatcher thread for processing. If you configure multiple servers in this way, also remember to reconfiguring the server heap space, offheap space, and so forth to account for the additional servers. Keep in mind, however, that the performance of any queries that do not scale may be negatively impacted by adding additional servers. Profile your application as a whole to determine whether the benefits of adding an additional server outweigh the tradeoff in query performance.
  • Tune the critical heap percentage to avoid overflowing table data to disk. The GemFire XD resource manager automatically overflows table data to disk in order to prevent out of memory conditions. To avoid this, tune the JVM and the GemFire XD critical heap percentage so that the eviction heap percentage is never exceeded by the tenured heap usage. See Heap and Off-Heap Eviction, Threshold Configuration Procedures.

    Keep in mind that as you increase the size of the HDFS asynchronous event queue, the queues are flushed less frequently and queue entries are more likely to be promoted to the tenured generation. To avoid crossing the eviction threshold due to HDFS queue-related garbage, tune the JVM to increase the young generation size and set the CMSInitiatingOccupancyFraction to keep any tenured garbage from accumulating for too long.

  • Use separate disk stores (on separate physical disks) for HDFS queue persistence and table persistence/overflow. If you do not specify a named disk store when creating an HDFS store or creating a persistent table, then GemFire XD uses the default disk store.
  • Use 10-gigabit Ethernet or better, and consider using channel bonding to increase network bandwidth. GemFire XD deployments that use partitioned tables, redundancy, and HDFS persistence may write to several different hosts with each update. Reads can also easily saturate a 1-gigabit NIC.
  • Run with compression where possible to reduce network traffic and improve throughput.
  • Increase member-timeout to prevent forced disconnects. If you run multiple servers per host, conditions such as slow networks, high-throughput applications, heavy CPU issues, or network congestion can delay GemFire XD peer message and cause servers to time out and disconnect from the distributed system. If you experience this problem, increase the member-timeout value.
  • For HDFS read/write tables, avoid queries that do not use a primary key value. Such queries require a full table scan. See Querying and Updating HDFS Read/Write Tables.