Overview of HDFS Persistence Models

GemFire XD provides two models for persisting GemFire XD table data to HDFS: HDFS Write-Only and HDFS Read/Write.

HDFS Write-Only

With HDFS write-only persistence, all of the DML operations that are applied to a table are persisted to HDFS log files. No compaction is performed on the persisted data, so the HDFS logs maintain a complete record of the table's operations over time. You can directly process the persisted HDFS data using Hadoop tools such as MapReduce. Or, you can use the GemFire XD PXF driver (installed with HAWQ) to access the persisted Hadoop data using HAWQ. However, you cannot access the persisted HDFS data from GemFire XD clients using SQL commands. Only the in-memory data for the table remains available for query processing. HDFS write-only tables can use the EVICTION BY CRITERIA Clause to configure the total amount of data maintained in memory.

HDFS write-only persistence enables applications to collect unbounded volumes of data in parallel for future processing. Common applications include:
  • Collecting (streaming) sensor data that is emitted at a constant, high rate, using multiple GemFire XD data stores. HDFS write-only persistence is combined with custom eviction criteria to address the memory constraints of individual GemFire XD members. Applications can only access and modify data that is currently in memory, or data that has been overflowed to local GemFire XD disk store operational log files. This operational data set typically contains a "window" of recent data that is of interest for high-performance querying. The HDFS store, however, contains a full record of all sensor data that was emitted.
  • Analyzing historical data for a table over time. In this type of application, the full data set of the table is stored in-memory and in local GemFire XD data stores (no HDFS eviction criteria are specified) while all changes to the table are captured and stored in HDFS tables. Here, the full table data is available for high-performance querying. Also, because all operations against the table are persisted to HDFS in time order, you can use HAWQ or Hadoop tools to analyze historical information about data values. For example, you can determine how much a specific value increased or decreased over a range of time, using event data in the HDFS store.

In contrast to other forms of GemFire XD persistence configurations, the HDFS write-only model does not enable you to read the HDFS-persisted data back into a GemFire XD system for querying. This means that update and delete operations can only be applied to table data that is currently in memory or stored in local GemFire XD disk stores.

The key considerations when using HDFS write-only persistence are:
  1. Managing the in-memory data footprint. Most streaming applications collect more data than can be accommodated in GemFire XD disk store files. See Managing the In-Memory Data for HDFS Tables.
  2. Controlling when HDFS log files are closed. Log files are not available to Hadoop tools until GemFire XD closes the files. See How GemFire XD Manages HDFS Data.
  3. Accessing HDFS log file data. See Using MapReduce to Access HDFS Table Data and Using HAWQ to Access HDFS Table Data.

See also Configuring HDFS Write-Only Persistence for an example of the steps you take to configure an HDFS write-only table.

HDFS Read/Write

With HDFS read/write persistence, all of the DML operations that are applied to a table are persisted to HDFS log files. GemFire XD periodically compacts the log files into larger files in order to improve read performance. You can access table data in the HDFS log files using Hadoop tools, or using SQL queries from clients or peers in the GemFire XD distributed system. For HDFS read/write tables, you use the EVICTION BY CRITERIA Clause to define a working subset of table data to keep in-memory for high-performance queries. Data that does not fit into this "operational" set is evicted from GemFire XD memory, but remains in HDFS log files to satisfy queries and updates. Only the in-memory, operational data is indexed for high-performance querying.

HDFS read/write persistence enables you to manage very large tables directly in GemFire XD—beyond the scope of what is possible when overflowing to local GemFire XD disk stores. Overflow to GemFire XD disk stores evicts only non-primary key columns from memory; primary keys and indexes are always retained in memory, so tables can become large enough that this data cannot fit into available memory. HDFS read/write tables persist entire rows to operational log files.

The key considerations when using HDFS read/write are:
  1. Managing the in-memory data footprint. The operational data set that you define should support the queries used by your applications, while controlling the memory footprint of GemFire XD servers. See Managing the In-Memory Data for HDFS Tables and Automatic Overflow for Operational Data.
  2. Managing compaction behavior for HDFS log files. Periodic compaction is necessary to improve read operations against HDFS data. See How GemFire XD Manages HDFS Data and Compaction for HDFS Log Files.
  3. Accessing table data from GemFire XD clients. Queries against HDFS read/write tables can either access just the operational (in-memory) table data, or they can access the complete table data (in-memory data and data stored in HDFS log files). By default, only the operational data is queries. DML statements always operate against the full data set, but GemFire XD provides additional DML statements to improve performance. See Querying and Updating HDFS Read/Write Tables.
  4. Accessing HDFS log file data outside of GemFire XD. See Using MapReduce to Access HDFS Table Data and Using HAWQ to Access HDFS Table Data.
Note: HDFS read/write tables that use eviction criteria cannot have foreign key constraints, because the table data needed to enforce such constraints would require scanning persisted HDFS data. To improve query performance for these tables, create indexes on the columns where you would normally assign foreign key constraints.

Configuration Requirements for Secure HDFS

GemFire XD supports the Pivotal HD Enterprise Hadoop implementation for persisting table data to HDFS. If you have configured Pivotal HD with Kerberos for secure HDFS, then you must perform additional steps to ensure that each GemFire XD member can authenticate with Kerberos, and has permission to write table data to the HDFS store. The specific configuration requirements for secure HDFS differ depending on whether you installed GemFire XD as a component of Pivotal HD (using the Pivotal Command Center CLI) or you installed GemFire XD as a standalone product outside of Pivotal HD (using either the RPM or ZIP file distribution).

Steps for Configuring GemFire XD for Standalone Installations

If you installed GemFire XD as a standalone product using the RPM or ZIP file distribution, then follow these steps to configure secure HDFS:
  1. Choose the GemFire XD service user name, and configure this user in Pivotal HD.

    Decide which user will execute the GemFire XD member processes. For example, on Windows platforms you might execute GemFire XD members from the command line as the logged-in user, or you may execute GemFire XD as a service using specific credentials. Whichever method you choose, the same user must execute the GemFire XD process and authenticate to Kerberos on each member machine.

    The example instructions in this procedure use "gfxd" as the service user name.

    After you determine the GemFire XD username, perform these steps on a Pivotal HD machine:
    1. Edit the /usr/lib/gphd/hadoop/etc/hadoop/hadoop-policy.xml configuration file to add the name of the GemFire XD user to the security.client.protocol.acl list. See Security in the Pivotal HD Enterprise Stack and Tool Reference Guide for more information.
    2. Configure HDFS read/write permission for the GemFire XD user. One way to achieve this is to first authenticate to HDFS as a superuser and create a home directory for the GemFire XD user in HDFS. Then, still as the superuser, change ownership of the home directory to the GemFire XD user.
  2. Generate a Kerberos service principal for each GemFire XD member in the distributed system.
    The service principal should be of the form name/role@REALM where:
    • name is the GemFire XD service user name (configured in the previous step, such as gfxd). Use the same name for each GemFire XD datastore member that writes data to HDFS.
    • role is the DNS resolvable, fully-qualified hostname of the GemFire XD member machine (output of hostname -f command).
    • REALM is the Kerberos Distribution Center (KDC) realm used in the Pivotal HD cluster (for example, LOCAL.DOMAIN).
    You can generate all principals from the same machine. For example, to use kadmin.local to add service principals for three GemFire XD datastores on the hosts gfxd1.example.com, gfxd2.example.com, and gfxd3.example.com, enter:
    kadmin.local:  addprinc -randkey gfxd/gfxd1.example.com@LOCAL.DOMAIN 
    kadmin.local:  addprinc -randkey gfxd/gfxd2.example.com@LOCAL.DOMAIN 
    kadmin.local:  addprinc -randkey gfxd/gfxd3.example.com@LOCAL.DOMAIN 

    Repeat this step for each GemFire XD datastore member machine. Use the same principal name and realm (gfxd and LOCAL.DOMAIN in the above example), but substitute the fully-qualified hostname of each machine.

  3. Generate a keytab file for each GemFire XD service principal.

    For each service principal that you created in the previous step, generate a dedicated keytab file. The files can be stored in any convenient location. The example commands in this procedure use the directory /etc/security/gfxd/keytab to store all keytab files, which will be deployed to GemFire XD member machines in a later step. If you store the keytab files in a single directory, ensure that you specify a different name for each file.

    For example, to create keytab files for the principals that were created in the previous step, enter these commands in kadmin.local:
    kadmin.local: xst -norandkey -k /etc/security/gfxd/keytab/gfxd1.keytab gfxd/gfxd1.example.com@LOCAL.DOMAIN
    kadmin.local: xst -norandkey -k /etc/security/gfxd/keytab/gfxd2.keytab gfxd/gfxd2.example.com@LOCAL.DOMAIN
    kadmin.local: xst -norandkey -k /etc/security/gfxd/keytab/gfxd3.keytab gfxd/gfxd3.example.com@LOCAL.DOMAIN

    Repeat this step to create a keytab file for each GemFire XD member principal.

  4. Generate a client configuration file for GemFire XD members.
    Each GemFire XD member that writes data to secure HDFS must have an XML-formatted HDFS client configuration file that:
    • Enables Kerberos for Hadoop authentication
    • Identifies the Hadoop NameNode principal
    • Identifies the service principal to use for that GemFire XD member (created in Step 1 above)
    • Identifies the keytab file associated with the GemFire XD service principal (created in Step 2 above)
    • Maps the Kerberos service principal name to a local operating system user.
    As a best practice, Pivotal recommends that you use the same client configuration file for each GemFire XD datastore that writes data to HDFS. In order to use a single file:
    • Use the special _HOST string instead of the fully-qualified host name, when specifying the GemFire XD service principal. A GemFire XD member automatically replaces _HOST with the hostname before authenticating.
    • Use a fixed path and filename for the keytab file in the client configuration file. Because each GemFire XD member principal requires a unique keytab file, you can either rename the keytab file on each member to match the path specified in the client configuration file, or you can use operating system commands to create a hard link to the keytab file that matches that path. See the next step in this procedure for details.
    The following listing shows the example contents of a client configuration file named gfxd-client.xml:
    <configuration>
         <property>
                 <name>hadoop.security.authentication</name>
                 <value>kerberos</value>
         </property>
         <property>
                  <name>dfs.namenode.kerberos.principal</name>
                  <value>namenode-service-principal</value>
         </property>
         <property>
                  <name>gemfirexd.kerberos.principal</name>
                  <value>gfxd/_HOST@LOCAL.DOMAIN</value>
         </property>   
         <property>
                  <name>gemfirexd.kerberos.keytab.file</name>
                  <value>/etc/security/gfxd/keytab/gfxd.service.keytab</value>
         </property>
         <property>
                  <name>hadoop.security.auth_to_local</name>
                  </value>principal-to-os-user-mapping</value>
         </property>
    </configuration>
    In the above client configuration file, the _HOST string is used to define the GemFire XD service principal, so that the same file can be used for all GemFire XD service principals. The configuration file requires that valid keytab file must be available on each member at /etc/security/gfxd/keytab/gfxd.service.keytab, as described in the next step.
    Note: For the dfs.namenode.kerberos.principal property, use the same value that is specified in the /usr/lib/gphd/hadoop/etc/hadoop/hdfs-site.xml file for your Pivotal HD installation.

    For the hadoop.security.auth_to_local property, use the same mapping rule that is specified in the /etc/gphd/hadoop/conf/core-site.xml file for your Pivotal HD installation.

  5. Deploy the client configuration file and keytab files to GemFire XD member machines.
    Use operating system commands to copy the client configuration file and keytab files to each GemFire XD member machine:
    1. Store the client configuration file using the same directory and filename on each GemFire XD member machine. You can accomplish this either by creating the same directory and file on each system, or by creating a hard link to the same path using operating system tools. For example using the suggested paths from the previous steps, you could store the client config file in the common keytab directory:
      $ cp ~/Downloads/gfxd-client.xml /etc/security/gfxd/keytab/gfxd-client.xml
      Note: You must specify the full path to the client configuration in the CREATE HDFSSTORE command, and that path must be valid for all GemFire XD members that use the HDFS store. This applies whether you use the same client configuration file for all GemFire XD members, or if you use different file contents on each member; in either case, the same path must be used.
    2. You can copy all of the generated keytab files to each GemFire XD member. However, each member must use the correct keytab file for its associated service principal. For example, the member on host gfxd1.example.com must use the keytab file that was generated for the gfxd/gfxd1.example.com@LOCAL.DOMAIN principal. Store the correct keytab file for the local GemFire XD member machine in the directory specified by the client configuration file.
      If you use the same configuration file contents for each GemFire XD member, then you will need to either rename the correct keytab file to match the file path specified in the configuration file, or create a hard link that links the configuration file path to the correct file. For example, on host gfxd1.example.com rename the correct file using:
      $ cp ~/Downloads/gfxd1.keytab /etc/security/gfxd/keytab/gfxd.service.keytab
      Or, to create a hard link:
      $ ln ~/Downloads/gfxd1.keytab /etc/security/gfxd/keytab/gfxd.service.keytab

      Either method references the correct keytab file in the location specified by the example client configuration file (/etc/security/gfxd/keytab/gfxd.service.keytab).

  6. Verify that each GemFire XD member can authenticate to Pivotal HD.
    One each GemFire XD member machine, use command-line utilities to verify that the principal can authenticate with kerberos and access HDFS. To verify that the principal can authenticate with kerberos, use kinit to obtain an authorization ticket:
    $ kinit -k -t /etc/security/gfxd/keytab/gfxd.service.keytab
    Substitute the path to the keytab file for the local GemFire XD member if necessary.
    If authentication is successful, verify that the member can access the secure Hadoop namenode:
    $ hadoop fs -ls hdfs://name-node-address:name-node-port/
    After validating HDFS access, destroy the kerberos authorization ticket:
    $ kdestroy -A

    Repeat this step to validate each GemFire XD member that will write data to HDFS.

  7. Specify the DFS client configuration each time you execute the CREATE HDFSSTORE command.
    You must specify the deployed client configuration file any time you execute the CREATE HDFSSTORE command in the distributed system. Remember that each GemFire XD member machine must have a client configuration file at the exact path specified in the command (regardless of whether the contents of the file are the same on each machine). For example:
    gfxd> CREATE HDFSSTORE streamingstore
      NameNode 'hdfs://gfxd1:8020'
      HomeDir 'stream-tables' 
      BatchSize 10
      BatchTimeInterval 2000 milliseconds
      QueuePersistent true
      MaxWriteOnlyFileSize 200
      WriteOnlyFileRolloverInterval 1 minute
      ClientConfigFile '/etc/security/gfxd/keytab/gfxd-client.xml';

Steps for Configuring GemFire XD for Pivotal Command Center CLI Installations

If you used the Pivotal Command Center CLI to install GemFire XD as a component of Pivotal HD, then the installation process automatically generates the necessary GemFire XD service principal (gfxd) and keytab files necessary for Kerberos authentication. Follow these steps to complete the secure HDFS configuration for GemFire XD after a CLI install:
  1. Verify that each GemFire XD member can authenticate to Pivotal HD.
    One each GemFire XD member machine, use command-line utilities to verify that the principal can authenticate with kerberos and access HDFS. To verify that the principal can authenticate with kerberos, use kinit to obtain an authorization ticket:
    $ sudo -u gfxd kinit -k -t /etc/security/gfxd/keytab/gfxd.keytab
    Substitute the path to the keytab file for the local GemFire XD member if necessary.
    If authentication is successful, verify that the member can access the secure Hadoop namenode:
    $ hadoop fs -ls hdfs://name-node-address:name-node-port/
    Note: Because Pivotal Command Center installs namenode HA by default, you can specify the logical name for the namenode (specified in hdfs-site.xml) instead of the namenode URL.
    After validating HDFS access, destroy the Kerberos authorization ticket:
    $ kdestroy -A

    Repeat this step to validate each GemFire XD member that will write data to HDFS.

  2. Specifying ClientConfigFile is optional when you execute the CREATE HDFSSTORE command.
    Pivotal HD automatically adds the /usr/lib/gphd/hadoop/etc/hadoop/hdfs-site.xml configuration file to the classpath, so you do not have to manually specify a client configuration file when you create an HDFS store. If you do include the option, be sure to specify the configuration file where you identified the GemFire XD service principal and keytab file location. For example:
    gfxd> CREATE HDFSSTORE streamingstore
      NameNode 'hdfs://gfxd1:8020'
      HomeDir 'stream-tables' 
      BatchSize 10
      BatchTimeInterval 2000 milliseconds
      QueuePersistent true
      MaxWriteOnlyFileSize 200
      WriteOnlyFileRolloverInterval 1 minute
      ClientConfigFile '/usr/lib/gphd/hadoop/etc/hadoop/hdfs-site.xml';
    Note: Because Pivotal Command Center installs namenode HA by default, you can specify the logical name for the namenode (specified in hdfs-site.xml) instead of the namenode URL.

Managing the In-Memory Data for HDFS Tables

Tables that use either HDFS write-only or HDFS read/write persistence store some portion of the table's data in memory as well as in HDFS log files. You must carefully configure the in-memory footprint for HDFS tables to balance the in-memory data requirements of your applications with the available memory in GemFire XD members.

Note: Indexes are only created and maintained on the in-memory, operational data set for an HDFS table. Because queries that access HDFS data do not use in-memory indexes, these queries should reference a primary key for performance reasons. Non primary key-based queries require a full table scan, which can lead to poor performance when accessing HDFS data.

EVICTION BY CRITERIA for HDFS Tables

With HDFS read/write tables or HDFS write-only tables, you can use the EVICTION BY CRITERIA Clause with a SQL predicate to define the data that you want GemFire XD to remove from memory. The remaining data represents the table's operational data set, which is kept in memory for high-performance querying. Keep in mind that the eviction action never propagates the destroy action to values in the HDFS store. Eviction simply removes table values from the in-memory data set, while the values remain in the table's HDFS log files.

The SQL predicate that you define in EVICTION BY CRITERIA can use table column values or functions to define data for eviction. For example, in the following clause rows becomes eligible for eviction when their column value exceeds a specified value:
EVICTION BY CRITERIA ( column-name > column-value )
EVICTION BY CRITERIA provides two optional clauses to specify when GemFire XD actually evicts eligible data. The EVICTION FREQUENCY clause defines a regular interval after which GemFire XD goes through the in-memory data and evicts eligible rows:
[ EVICTION FREQUENCY integer-constant { SECONDS | HOURS | DAYS } [ START GMT-timestamp-constant ] ]
The START clause shown above specifies a fixed time to begin evicting data. You can also specify the START clause in the ALTER TABLE statement to force eviction on an existing table.
Note: If you use the START clause, you must specify the timestamp using the GMT (Greenwich Mean Time) time zone.

In contrast, the EVICT INCOMING clause evicts data immediately when it is inserted or modified.

You can include only one of EVICTION FREQUENCY or EVICT INCOMING. Use EVICT INCOMING only when both of the following are true:
  • The EVICTION BY CRITERIA predicate uses only column values to define the data to evict, and
  • Your applications never need high-speed access to the evicted data (the data need never be in-memory).

If you need to keep the evicted rows in memory for some period of time for query processing, then use EVICTION FREQUENCY.

Note: Regardless of how you configure the operational data set, it is critical that you profile your application with representative data to ensure that operational data set does not consume excessive memory in GemFire XD members. Both the eviction criteria and the eviction frequency can help mitigate the memory required for an HDFS read/write table. However, there may still be cases when application usage patterns cause the memory footprint of operational data to expand beyond capacity. GemFire XD uses Automatic Overflow for Operational Data to ensure that spikes in operational data do not cause out of memory errors.
Note: HDFS read/write tables that use eviction criteria cannot have foreign key constraints, because the table data needed to enforce such constraints would require scanning persisted HDFS data. To improve query performance for these tables, create indexes on the columns where you would normally assign foreign key constraints.

EXPIRE Clause for HDFS Tables

In addition to setting custom eviction criteria with EVICT BY CRITERIA, you can use the EXPIRE clause to destroy in-memory table data after a configured amount of time. The EXPIRE clause works for HDFS-persistent tables as well as non-HDFS tables. However, keep in mind that the DESTROY action triggered by an EXPIRE clause is not propagated to data stored in HDFS; only the in-memory table data is destroyed. Also, the local DESTROY action does not affect dependent tables (for example, child rows with foreign key relationships). See EXPIRE Clause.

Automatic Overflow for Operational Data

Eviction criteria by themselves cannot guarantee that the amount of operational data will never exceed available memory on the system. For this reason, GemFire XD automatically enables overflow for HDFS tables that use the EVICTION BY CRITERIA clause. Automatic overflow for HDFS tables occurs when the in-memory data reaches a critical percentage of the available heap space. You configure the critical heap percentage using the SYS.SET_EVICTION_HEAP_PERCENTAGE or SYS.SET_EVICTION_HEAP_PERCENTAGE_SG procedures, or by specifying the -critical-heap-percentage option when you start a GemFire XD member with gfxd. Also ensure that you start GemFire XD members using the -heap-size option, which enables the JVM resource manager.

Automatic overflow works by evicting table values to a local GemFire XD disk store, until the heap size decreases below the configured threshold.

GemFire uses the first available disk store in this list to store overflow data for an HDFS table:
  1. If you have configured an HDFS read/write table with persistence to a local GemFire XD log file, then GemFire XD uses the same persistence disk store for automatic overflow.
  2. If the HDFS queue was configured for persistence (using the QUEUEPERSISTENT clause in the CREATE HDFSSTORE statement), then GemFire uses that disk store for automatic overflow.
  3. If neither the table or HDFS queue are configured for persistence, then GemFire XD uses the default disk store for automatic overflow. The default disk store uses file names beginning with BACKUPSQLF-DEFAULT-DISKSTORE, and they are stored in the root directory of GemFire XD data stores that host the table.

Unlike eviction to HDFS (the EVICTION BY CRITERIA clause), data that is overflowed to local GemFire XD disk stores remains indexed for improved query performance. However, the query performance for overflow data is reduced compared to queries against in-memory data. See Evicting Table Data from Memory.

Using Local Disk Store Persistence to Preserve the In-Memory Operational Data

While data that stored in HDFS log files remains in HDFS until those files are deleted, GemFire XD does not persist the in-memory table data by default. If you restart a cluster having tables that do not persist in-memory data, then the in-memory data in those tables is lost. Accessing the HDFS data after a restart (using the queryHDFS query hint) does not place the data back into operational memory for later queries; operational data can only be re-created using insert statements, subject to the EVICTION BY CRITERIA clause or EXPIRE clause configuration.

You can optionally persist the in-memory data using local GemFire XD disk store files. When local disk store persistence is configured, GemFire XD restores the operational data set in-memory through GemFire XD member restarts.

You configure local persistence for an HDFS read/write table's in-memory data using the PERSISTENT clause in the CREATE TABLE statement. This clause can be used with all types of GemFire XD tables, including those that do not persist data to HDFS. The PERSISTENT clause can be used alone, in which case the in-memory table data is persisted to the GemFire XD default disk store file. Or, you can specify a named disk store that you previously configured using the CREATE DISKSTORE file.

In either case, when you configure an HDFS read/write table with the PERSISTENT clause, the disk store used for persistence is also used for automatic overflow of the table's in-memory data.

Configuring HDFS Write-Only Persistence

Persisting data to HDFS in write-only mode enables you to stream large volumes of data for processing with Hadoop tools or HAWQ.

To configure HDFS write-only for a partitioned table, you first create an HDFS store configuration with properties to define the Hadoop NameNode and directory in which to store the data. You can optionally specify properties to configure the queue that GemFire XD uses to stream table data or the rollover frequency for HDFS log files. After creating the HDFS store, use CREATE TABLE with the HDFSSTORE and WRITEONLY clauses to create the partitioned table and stream its data to HDFS.
Note: All authentication and authorization for HDFS access is tied to the user that executes the GemFire XD member process. For example, when you create an HDFS store in GemFire XD, the default directory structure for the store is created under the /user/user-name directory in HDFS, where user-name is the process owner for GemFire XD. (You can override this default by specifying an absolute path beginning with the "/" character, or by setting an alternate HDFS root directory for servers using the hdfs-root-dir property.) The GemFire XD user must be able to authenticate with Kerberos, and must have read/write permission on the HDFS store directory on each Hadoop cluster node. See Configuration Requirements for Secure HDFS.
  1. Use CREATE HDFSSTORE to define the Hadoop connection and queue that GemFire XD uses to stream table data to Hadoop. For example:
    CREATE HDFSSTORE streamingstore
      NameNode 'hdfs://gfxd1:8020'
      HomeDir 'stream-tables' 
      BatchSize 10
      BatchTimeInterval 2000 milliseconds
      QueuePersistent true
      MaxWriteOnlyFileSize 200
      WriteOnlyFileRolloverInterval 1 minute;
    Note: Because Pivotal Command Center installs namenode HA by default, you can specify the logical name for the namenode (specified in hdfs-site.xml) instead of the namenode URL.
    The above command creates a new HDFS store configuration in the specified Hadoop NameNode and directory. The GemFire XD queue used to write to this HDFS store is persisted to a local GemFire XD disk store. (The default GemFire XD disk store is used, because the above command does not specify a named disk store). Data in the queue is written to the HDFS store in batches as large as 10 megabytes, and the queue is flushed to the store at least once every 2000 milliseconds. GemFire XD streams all data to an HDFS log file until the file size reaches 200 megabytes, or until the file has been opened for 1 minute.
  2. Use CREATE TABLE to create a partitioned table that persists to the HDFS store you created, and specify the WRITEONLY clause to use HDFS the write-only model. For example:
    CREATE TABLE monitor1
      ( mon_id INT NOT NULL PRIMARY KEY,
        mon_date DATE NOT NULL,
        mon_data VARCHAR(2000) )
    PARTITION BY PRIMARY KEY
    EXPIRE ENTRY WITH TIMETOLIVE 300 ACTION DESTROY
    HDFSSTORE (streamingstore) WRITEONLY;
    In addition to streaming table operations to the HDFS store, this CREATE TABLE statement ensures that table entries are evicted from memory (destroyed) after 5 minutes.

Configuring HDFS Read/Write Persistence

Persisting table to data to HDFS in read/write mode provides the flexibility of managing very large tables in GemFire XD while supporting high-performance querying of operational, in-memory data.

To configure HDFS with read/write persistence for a partitioned table, you first create a HDFS store configuration. The HDFS store properties define the basic Hadoop connection the GemFire XD queue configuration, but can also define compaction behavior for the persisted files. After defining the read/write HDFS store, use CREATE TABLE to create the partitioned table that uses the store for persistence.
Note: All authentication and authorization for HDFS access is tied to the user that executes the GemFire XD member process. For example, when you create an HDFS store in GemFire XD, the default directory structure for the store is created under the /user/user-name directory in HDFS, where user-name is the process owner for GemFire XD. (You can override this default by specifying an absolute path beginning with the "/" character, or by setting an alternate HDFS root directory for servers using the hdfs-root-dir property.) The GemFire XD user must be able to authenticate with Kerberos, and must have read/write permission on the HDFS store directory on each Hadoop cluster node. See Configuration Requirements for Secure HDFS.
  1. Use CREATE HDFSSTORE to define the Hadoop connection and queue that GemFire XD uses to stream table data to Hadoop, specifying any custom compaction behavior for the persisted files. For example:
    CREATE HDFSSTORE readwritestore
      NameNode 'hdfs://gfxd1:8020'
      HomeDir 'gfxd-overflow-tables'
      BatchSize 10
      BatchTimeInterval 2000 milliseconds
      QueuePersistent true
      MinorCompact true
      MajorCompact true
      MaxInputFileSize 12
      MinInputFileCount 4
      MaxInputFileCount 8
      MinorCompactionThreads 3
      MajorCompactionInterval 10 minutes
      MajorCompactionThreads 3;
      
    Note: Because Pivotal Command Center installs namenode HA by default, you can specify the logical name for the namenode (specified in hdfs-site.xml) instead of the namenode URL.
    The above command creates a new HDFS store configuration in the specified Hadoop NameNode and directory. The GemFire XD queue used to write to this HDFS store is persisted to a local GemFire XD disk store. The default GemFire XD disk store is used, because the above command does not specify a named disk store. Data in the queue is written to the HDFS store in batches as large as 10 megabytes, and the queue is flushed to the store at least once every 2000 milliseconds.

    GemFire XD performs both minor and major compaction on the resulting HDFS store persistence files. Minor compaction is performed on files up to 12 MB in size after at least 4 files are created. The major compaction cycle occurs every 10 minutes, using a maximum of 3 threads.

  2. Use CREATE TABLE to create a partitioned table that persists to the HDFS store you created. Use EVICTION BY CRITERIA to define the operational data set, and omit the WRITEONLY clause to ensure that GemFire XD can read back the data it persists to HDFS, using full table scans. For example:
    CREATE TABLE bigtable1
      ( big_id INT NOT NULL,
        big_date DATE NOT NULL,
        big_data VARCHAR(2000) )
    PARTITION BY PRIMARY KEY
    EVICTION BY CRITERIA ( big_id < 300000 )
    EVICTION FREQUENCY 180 SECONDS
    HDFSSTORE (readwritestore);
    In the above table, the EVICTION BY CRITERIA clause uses the value of the big_id column to define the table's operational data.

    Keep in mind that the EVICTION BY CRITERIA clause defines the data that you do not want to keep in the operational data set. Table data that matches the criteria is evicted, while the remaining data is kept in memory for high-performance querying.

    Note: HDFS read/write tables that use eviction criteria cannot have foreign key constraints, because the table data needed to enforce such constraints would require scanning persisted HDFS data. To improve query performance for these tables, create indexes on the columns where you would normally assign foreign key constraints.

Querying and Updating HDFS Read/Write Tables

By default, queries operate only against the in-memory, operational data set for a table. Clients that are interested in the full table dataset (both HDFS data and in-memory data) can use the query-HDFS connection property, or the queryHDFS query hint that applies only to a specific query.

To specify the queryHDFS property in a single query, include the --GEMFIREXD-PROPERTIES comment after specifying the table name. The following two example queries show the correct usage of queryHDFS:
select * from table-name --GEMFIREXD-PROPERTIES queryHDFS=true \n ;
select * from table-name --GEMFIREXD-PROPERTIES queryHDFS=true \n where column-name=column-value;

Notice the placement of queryHDFS in the second example. The property must always be defined immediately after the table names in the SQL statement (and before any additional SQL predicates). Also notice the \n characters shown in the examples. Because query hints are specified in SQL comments starting with --GEMFIREXD-PROPERTIES, you must include \n or a newline character before continuing the SQL command, or the remaining characters are ignored. See also Overriding Optimizer Choices.

To set HDFS query behaviour for all SQL statements executed in a single connection, specify the query-HDFS connection property when you connect to the GemFire XD cluster. For example:
gfxd> connect client 'localhost:1527;query-HDFS=true';

Keep in mind that queries that access HDFS data do not use in-memory indexes. Any queries that access HDFS-persistent data should reference a primary key, for performance reasons. Non primary key-based queries require a full table scan, which can lead to poor performance when accessing HDFS data.

Note: Querying data using queryHDFS=true does not place the data back into operational memory for later queries; operational data can only be re-created using insert statements (subject to the EVICTION BY CRITERIA clause or EXPIRE clause of the table) or by automatic recovery of the operational data from local persistence files. See Using Local Disk Store Persistence to Preserve the In-Memory Operational Data.

GemFire XD provides a convenience function, COUNT_ESTIMATE, to estimate the total number of rows in an HDFS read/write table. Use COUNT_ESTIMATE instead of using COUNT(*) with queryHDFS=true, which iterates over all of the table's data.

Most DML commands (update and delete operations) always operate against the full data set of the table, even if you do not specify the queryHDFS hint. You need not provide any query hint to ensure that a value is inserted, updated, or deleted.

Note: For HDFS Write-only tables, TRUNCATE TABLE removes only the in-memory data for the table, but leaves the HDFS log files available for later processing with MapReduce and HAWQ. This occurs regardless of the queryHDFS setting.

For HDFS Read-Write tables, TRUNCATE TABLE removes the in-memory data for the table and marks the table's HDFS log files for expiry. This means that GemFire XD queries, MapReduce jobs with CHECKPOINT mode enabled, and HAWQ external tables with CHECKPOINT mode enabled no longer return data for the truncated table. MapReduce jobs and HAWQ external tables that do not use CHECKPOINT mode will continue to return some table values until the log files expire.

GemFire XD provides a special PUT INTO command to speed update operations on HDFS read/write tables. PUT INTO uses a syntax similar to the INSERT statement:
PUT INTO table-name
{ VALUES (...) |
  SELECT ...
}

PUT INTO differs from standard INSERT or UPDATE statements in that GemFire XD does not check existing primary key values before executing the command. If a row with the same key exists in the table, PUT INTO simply overwrites the older row value. If no rows with the same primary key exist, PUT INTO operates like a standard INSERT. Removing the primary key check speeds execution when updating large numbers of rows in HDFS stores.

How GemFire XD Manages HDFS Data

When you configure a table for HDFS persistence, GemFire XD places all data to be persisted to HDFS in an asynchronous queue. A single HDFS queue is created for each set of colocated tables that use the queue. A configurable number of dispatcher threads periodically write batches of updates from the queue to HDFS operational log files. The behavior of the HDFS queue and the number and format of HDFS log files differ depending on whether you configure a table for HDFS write-only or HDFS read/write persistence.

HDFS Store Directory Structure

GemFire XD creates a Hadoop subdirectory hierarchy for each HDFS store that is configured in the system. The root directory of the hierarchy is specified by the hdfs-root-dir property. By default this directory is /user/user_name, where user-name is the user that started GemFire XD JVM (for example, /user/gfxd).

Beneath the root directory, each HDFS store specifies a unique directory in which to store HDFS log files for tables that use the store. This is the HomeDir option provided in the CREATE HDFSSTORE statement.

In the HDFS store home directory, a separate directory is created for each table that uses the HDFS store, and each bucket in the table gets a further subdirectory.

HDFS Metadata Files

In addition to persisting table data to HDFS in operational log files, GemFire XD persists the table schema (the DML used to create the table) to HDFS in metadata files. These metadata files are used to recreate the table for MapReduce jobs or HAWQ operations that access the persisted data outside of a GemFire XD distributed system. Metadata files are stored in the .hopmeta subdirectory of the HDFS store, and have the .ddlhop extension.

HDFS Log Files

Each GemFire XD member that stores primary buckets for a partitioned table's data maintains an asynchronous queue of events for those buckets. As GemFire XD flushes the contents of the asynchronous queues, it creates new HDFS persistence files for those buckets. Each bucket has a dedicated subdirectory under the table directory where it stores all operational logs for that bucket's values. (Hadoop may also split HDFS persistence files as necessary to accommodate the configured HDFS block size.)

An operational log file uses the following format: bucketNumber-timeStamp-sequenceNumber.extension. The most common types of log file have the following extensions:
  • .hop is the extension used for files that store the raw, uncompacted table data that is flushed from an HDFS queue. Multiple .hop files are generated as necessary to store new data. New .hop files that are created for the same bucket increment the sequence-number in the filename. MapReduce jobs that process events over a period of time (event-mode jobs) access multiple .hop files in order to collect the required data.
  • .exp is a zero-length marker file that indicates when a .hop file expired.
  • .shop stores table data for a write-only HDFS table.
  • .ihop is an intermediate log file created by GemFire XD minor compaction. See Compaction for HDFS Log Files.
  • .chop is a snapshot log file created by GemFire XD major compaction. See Compaction for HDFS Log Files.

All HDFS persistence files are identified by a timestamp, which indicates that the file stores table operations that occurred earlier than the timestamp. File timestamps can be used in MapReduce jobs to process records for a specified interval, or for a particular point in time (a snapshot or checkpoint of the table data).

For tables that use HDFS write-only persistence, the HDFS queue on a primary member does not sort the events for the buckets it manages. Instead, the queue persists the unsorted operations to HDFS, and multiple batches are persisted to the same file in HDFS. Each file may include multiple entries for the same row, as updates and delete operations are appended. The same row value may appear in multiple persistence files, because GemFire XD does not automatically compact the files' contents.

For tables that use HDFS read/write persistence, the HDFS queue buffers and sorts all operations that occur for a given bucket, before persisting queued events to HDFS. Each ordered batch of events creates a new file in the HDFS store. However, GemFire XD performs periodic compaction to merge smaller files into larger files to improve read performance. MapReduce jobs can use the timestamps of individual persistence to perform incremental processing of values over a period of time. Also, because the events are ordered in the persistence files, individual values in a file can be accessed efficiently using primary key queries from GemFire XD clients.

Compaction for HDFS Log Files

Although HDFS read/write persistence initially creates multiple small persistence files in HDFS (one file for each batch of operations that is flushed from the HDFS queue), the primary GemFire XD member periodically compacts those persistence files into larger files. This compaction behavior helps to improve read performance by reducing the number of available files, and by removing older, intermediate row values from the persisted files. GemFire XD performs both minor and major compaction cycles:
  • Minor compaction combines multiple files into one larger file by discarding older row values. Reducing the number of files in HDFS helps to avoid performance degradation in HDFS and the GemFire XD distributed system. You configure minor compaction by setting the minimum and maximum number of files, per table bucket, that are eligible for compaction, as well as the maximum size of file to consider when performing minor compaction. You can also configure the number of threads used to perform minor compaction.
    Note: Use caution when increasing the MinInputFileCount or MaxInputFileSize values, as they apply to each bucket persisted by the HDFS store, rather than to the HDFS store as a whole. As more tables target the HDFS store, additional HDFS file handles are required to manage the number of open files. A large number of buckets combined with a high MinInputFileCount can cause too many files to be opened in HDFS, resulting in exceptions if open file resources are exhausted. Ensure that you have configured your operating system to support large numbers of file descriptors (see Supported Configurations and System Requirements.)

    After minor compaction completes, GemFire XD marks the smaller persistence files for deletion. GemFire XD no longer uses the files for accessing table data for client queries. However, you can still use MapReduce jobs to process incremental data in those files. You can configure the amount of time GemFire XD keeps the expired files available in HDFS by specifying the PurgeInterval option in the CREATE HDFSSTORE command.

    Note: Do not disable minor compaction unless you tune other HDFS parameters to avoid severe performance degradation. Turning off minor compaction can cause a very large number of HDFS log files to be created, which can potentially exhaust HDFS receiver threads and/or client sockets. To offset these problems, increase the BatchTimeInterval and BatchSize options to create a fewer number of HDFS log files. As a best practice, leave minor compaction enabled unless compaction causes excessive I/O overhead in HDFS that cannot be resolved by tuning compaction behavior.
  • Major compaction combines all of the row values for a given bucket into a single persistence file. After major compaction is performed only the latest row values are stored in the persistence files, and the file represents a snapshot of the bucket values. You can configure major compaction by specifying the interval of time after which GemFire XD performs automatic major compaction, as well as the number of threads used to perform compaction. You can also manually initiate major compaction using the SYS.HDFS_FORCE_COMPACTION procedure. The SYS.HDFS_LAST_MAJOR_COMPACTION function returns the timestamp of the last completed major compaction cycle.

GemFire XD performs automatic compaction when you enable compaction in CREATE HDFSSTORE, and only for tables that perform HDFS read/write persistence.

Tables that are configured for HDFS write-only persistence do not read data back from HDFS, so GemFire XD does not compact the log files to improve read performance. Instead, all intermediate row values remain available in the raw HDFS log files. MapReduce jobs and HAWQ can specify which log files to use in order to process time series data.

Understanding When Data is Available in HDFS

When you insert or update data in a GemFire XD HDFS table, your changes are not immediately available in HDFS log files. Several factors determine the length of time required before table DML operations are available in HDFS log files for processing by Hadoop tools and HAWQ.

The first factor that affects the delay in writing DML to HDFS files is the HDFS store queue configuration (CREATE HDFSSTORE command). Both the BatchSize and BatchTimeInterval arguments determine how long DML operations remain in the queue's memory before being written to an HDFS log file. Using smaller batch sizes and time intervals shortens the time before operations are available in HDFS log files. However, keep in mind that large batch sizes and intervals enable GemFire XD to write fewer files to HDFS, which increases performance.

You can also use the SYS.HDFS_FLUSH_QUEUE procedure to flush the contents of a table's HDFS queue.

Note: The DispatcherThreads option determines how many threads are used to process batches from the HDFS queue. By default, GemFire XD uses 5 dispatcher threads to write batches from the queue to HDFS log files. If you have numerous clients that execute operations on HDFS tables, the number of dispatcher threads can become a bottleneck. If you notice that HDFS queues frequently overflow to GemFire XD operational log files, consider increasing the number of threads that are used to process batches in the queue.

Even after DML events are written to a raw HDFS log file, the log file is not available for processing by Hadoop tools or HAWQ until GemFire XD closes the log file. For HDFS write-only tables, the MaxWriteOnlyFileSize and WriteOnlyFileRolloverInterval properties determine how frequently GemFire XD closes a given log file (making it available for MapReduce and HAWQ) and begins writing to a new file.

Although HDFS read/write tables provide raw HDFS log file data for time series processing, by default MapReduce jobs and HAWQ external tables work only with snapshot or checkpoint HDFS log files. These files provide only the current row values for a table at a given point in time. When working with checkpoint data, certain DML operations may not be available in HDFS log files until GemFire XD completes a major compaction cycle, which introduces a new snapshot of bucket values. The frequency with which DML operations appear in snapshot or checkpoint mode is determined by the MajorCompactionInterval and MajorCompactionThreads properties. Again, keep in mind that performing more frequent compactions can negatively affect performance. You will need to balance your performance requirements with the needs of other applications that process the persisted data.

You can also use the SYS.HDFS_FORCE_COMPACTION procedure to initiate a major compaction cycle.

Note: You can choose to work with raw HDFS log file data instead of checkpoint files by setting the CHECKPOINT property in the MapReduce RowInputFormat configuration or in the HAWQ external table definition. See Using MapReduce to Access HDFS Table Data and Using HAWQ to Access HDFS Table Data.

Using MapReduce to Access HDFS Table Data

GemFire XD stores all table data in HDFS in a format that is native to the GemFire XD product, in order to provide read/write access from GemFire XD clients. GemFire XD also extends the Hadoop RowInputFormat class to enable MapReduce jobs to access data in HDFS log files without having to start up or connect to a GemFire XD distributed system.

Note: GemFire XD provides a RowInputFormat and RowOutputFormat implementation for the older "mapred" API, as well as the new MapReduce API. You should consistently use the implementation that corresponds to the specific API version you are using for development.
Note: This section contains excerpts from the GemFire XD MapReduce sample applications. The full source code for the examples is installed in the /usr/lib/gphd/gfxd/examples directory.

Configuring Input to MapReduce Jobs

The GemFire XD RowInputFormat implementation configures the input data set and converts the raw data stored in HDFS operational logs into table rows for consumption in your custom MapReduce jobs. RowInputFormat properties configure input parameters such as the name of the table to use as input, the location of the tables HDFS log files, and the selection of log files to use for input. The following table describes each configuration property:

Table 1. RowInputFormat Configuration Properties
Property Compatible GemFire XD Tables Description
INPUT_TABLE Write-Only, Read/Write The name of the GemFire XD table in the format: schema_name.table_name
HOME_DIR Write-Only, Read/Write The name of the HDFS directory that stores the table's log files. This corresponds to the HomeDir value used in the CREATE HDFSSTORE statement.
START_TIME_MILLIS Write-Only, Read/Write Identifies the earliest timestamp for table events to process in the MapReduce job. You can specify this property by itself to process all events that occurred after a specific time, or you can combine it with END_TIME_MILLIS to process a range of events.

If you also configure the CHECKPOINT property, then the START_TIME_MILLIS value is ignored.

END_TIME_MILLIS Write-Only, Read/Write Identifies the latest timestamp for table events to process in the MapReduce job. You can specify this property by itself to process all events that occurred before a specific time, or you can combine it with START_TIME_MILLIS to process a range of events.

If you also configure the CHECKPOINT property, then the END_TIME_MILLIS value is ignored.

CHECKPOINT Read/Write This property is only available for HDFS read/write tables. By default, MapReduce jobs only process the latest table row values, instead of table events that occurred over time. Set the CHECKPOINT property to false in order to process incremental table values persisted in the HDFS log files.

For HDFS write-only and HDFS read/write tables, the START_TIME_MILLIS and/or END_TIME_MILLIS properties specify a range of table events to process in the MapReduce job. When you specify a time range for table events using these properties, the input formatter uses only raw .hop files for input data. Omitting both properties (and disabling CHECKPOINT) uses all available .hop files.

For HDFS read/write tables, you can optionally set the CHECKPOINT property to false in order to process the incremental table values stored in raw HDFS .hop files. By default, MapReduce jobs only use only compacted HDFS log files (.chop files) for the input data set. The .chop files are created during GemFire XD major compaction cycles, and contain only the latest row values for the table. The CHECKPOINT property is not applicable to HDFS write-only tables, because GemFire XD does not compact streamed HDFS log files. See How GemFire XD Manages HDFS Data for more information about HDFS log file types and compaction behavior.

You configure all properties for the input data set using a standard Hadoop JobConf object. For example, this code configures the input formatter to provide all raw .hop file data for the input table and HDFS home directory provided in the program's arguments (omitting both START_TIME_MILLIS and END_TIME_MILLIS returns all active .hop files):

    JobConf conf = new JobConf(getConf());
    conf.setJobName("Busy Airport Count");

    Path outputPath = new Path(args[0]);
    Path intermediateOutputPath = new Path(args[0] + "_int");
    String hdfsHomeDir = args[1];
    String tableName = args[2];

    outputPath.getFileSystem(conf).delete(outputPath, true);
    intermediateOutputPath.getFileSystem(conf).delete(intermediateOutputPath, true);

    conf.set(RowInputFormat.HOME_DIR, hdfsHomeDir);
    conf.set(RowInputFormat.INPUT_TABLE, tableName);
    conf.setBoolean(RowInputFormat.CHECKPOINT, false);

    conf.setInputFormat(RowInputFormat.class);
    conf.setMapperClass(SampleMapper.class);

Working With Table Data

Based on the configuration properties that you set, the GemFire XD RowInputFormat implementation identifies the .hop or .shop files that contain the necessary data, and creates splits over the HDFS blocks that store those files. Keep in mind that DML operations to GemFire XD tables are not immediately available in HDFS log files; see Understanding When Data is Available in HDFS for more information.

The implementation then provides table records to the map method in your Mapper implementation using key/value pairs. The KEYIN object that is provided to your Mapper implementation contains no relevant table data; it is provided only to conform to the standard Hadoop interface. All table data is encapsulated in a GemFire XD Row object. You either work with a single row value, or you can obtain ResultSet of multiple rows. This example Mapper implementation uses the row to obtain a result set:

  public static class SampleMapper extends MapReduceBase
      implements Mapper<Object, Row, Text, IntWritable> {

    private final static IntWritable countOne = new IntWritable(1);
    private final Text reusableText = new Text();

    @Override
    public void map(Object key, Row row,
        OutputCollector<Text, IntWritable> output,
        Reporter reporter) throws IOException {

      String origAirport;
      String destAirport;

      try {
        ResultSet rs = row.getRowAsResultSet();
        origAirport = rs.getString("ORIG_AIRPORT");
        destAirport = rs.getString("DEST_AIRPORT");
        reusableText.set(origAirport);
        output.collect(reusableText, countOne);
        reusableText.set(destAirport);
        output.collect(reusableText, countOne);
      } catch (SQLException e) {
        e.printStackTrace();
      }
    }
  }

As with standard MapReduce jobs, the Map method can evaluate each table record against some condition, or accumulate values as in the above example. Hadoop sorts the mapper output before sending to reducer, which reduces the set of intermediate values and uses an OutputFormatter to persist values.

Sending MapReduce Output to GemFire XD Tables

Whereas standard MapReduce jobs typically write their output to a file system, GemFire XD provides an OutputFormatter implementation that simplifies the task of writing data back into a GemFire XD table. Writing to a GemFire XD table requires that you have a distributed system running, with the table already defined in the data dictionary.

As with the RowInputFormat implementation, you use the RowOutputFormat implementation to configure properties in a standard Hadoop JobConf object. The following table describes each configuration property:

Table 2. RowOutputFormat Configuration Properties
Property Description
OUTPUT_TABLE The name of the GemFire XD table to write to, in the format: schema_name.table_name
OUTPUT_URL The JDBC thin client driver string used to connect to the GemFire XD distributed system (for example, jdbc:gemfirexd://hostname:port/). See Connect to a GemFire XD Server with the Thin Client JDBC Driver.
For example, the following sample code assumes that a GemFire XD locator is running locally, so it configured the JDBC connection string using the local host name. The output formatter is configured to write records to the APP.BUSY_AIRPORT table:
      JobConf topConf = new JobConf(getConf());
      topConf.setJobName("Top Busy Airport");

      String hdfsFS = topConf.get("fs.defaultFS");
      URI hdfsUri = URI.create(hdfsFS);
      hdfsUri.getHost();

      topConf.set(RowOutputFormat.OUTPUT_URL, "jdbc:gemfirexd://" + hdfsUri.getHost() + ":1527");
      topConf.set(RowOutputFormat.OUTPUT_TABLE, "APP.BUSY_AIRPORT");
...
      topConf.setReducerClass(TopBusyAirportReducer.class);
      topConf.setOutputKeyClass(Key.class);
      topConf.setOutputValueClass(BusyAirportModel.class);
      topConf.setOutputFormat(RowOutputFormat.class);

When you use the output formatter in your MapReduce job, all table writes insert data to the configured table data using the PUT INTO statement. PUT INTO uses a syntax similar to the INSERT statement, but GemFire XD does not check existing primary key values before executing the PUT INTO command. If a row with the same key exists in the table, PUT INTO simply overwrites the older row value.

Note: A PUT INTO operation does not invoke triggers, and you cannot use PUT INTO in the triggered SQL statement of a GemFire XD trigger.

As with standard Hadoop MapReduce jobs, the output is written using key/value pairs. As with the GemFire XD RowInputFormat implementation, the Key contains no relevant table information, but is only provided for compatibility with the standard Hadoop interface. All table data should be placed in the output value.

The following example writes key/value pairs using the RowOutputFormat configuration shown above. A new Key value is created just before the write, but contains no data. The output value is an instance of BusyAirportModel, which serializes itself as a JDBC row:
  public static class TopBusyAirportReducer extends MapReduceBase
      implements Reducer<Text, StringIntPair, Key, BusyAirportModel> {

    @Override
    public void reduce(Text token, Iterator<StringIntPair> values,
        OutputCollector<Key, BusyAirportModel> output, Reporter reporter)
        throws IOException {
      String topAirport = null;
      int max = 0;

      while (values.hasNext()) {
        StringIntPair v = values.next();
        if (v.getSecond() > max) {
          max = v.getSecond();
          topAirport = v.getFirst();
        }
      }
      BusyAirportModel busy = new BusyAirportModel(topAirport, max);
      output.collect(new Key(), busy);
    }
  }

Using HAWQ to Access HDFS Table Data

GemFire XD stores all table data in HDFS in a format that is native to the GemFire XD product, in order to provide read/write access from GemFire XD clients. In addition to the input/output system used to support standard Hadoop MapReduce, GemFire XD supports a PXF Driver (installed with HAWQ) to enable Pivotal HAWQ to read HDFS table data. To access GemFire XD HDFS table data in HAWQ, you use the HAWQ CREATE EXTERNAL TABLE command to map the columns that you want to query.

Note: The PXF Driver does not support mapping GemFire XD columns of CHAR data type (PXF instead supports the BPCHAR type, which is incompatible with GemFire XD). For any tables having character data that you intend to access using HAWQ, use columns of VARCHAR data type instead of CHAR.

Prerequisites

In order to access GemFire XD HDFS table data, the Pivotal Extension Framework (PXF) must be installed with HAWQ, and PXF must add the gemfirexd.jar library to its dependencies:
  1. On the Pivotal HD Admin node, execute the commands:
    $ icm_client fetch-configuration -l mycluster -o conf
    $ icm_client stop -l mycluster
    Substitute mycluster with the name of your cluster, and conf with the name of the destination directory (for example, /etc/gphd/pxf/conf).
  2. If GemFire XD is not installed on the local HAWQ/PXF machine, copy the gemfirexd.jar file from a GemFire XD installation to the HAWQ/PXF machine.
  3. On the HAWQ/PXF machine, add the full path to gemfirexd.jar to the /etc/gphd/pxf/conf/pxf-public.classpath file. gemfirexd.jar is installed to the /lib subdirectory of the GemFire XD installation directory:
    • For RPM installations, the file is installed to /usr/lib/gpdb/gfxd/lib/gemfirexd.jar.
    • For ZIP file installations, the file is installed to Pivotal_GemFireXD_13_bNNNN_platform/lib/gemfirexd.jar in the directory where you unzipped the file.
    If you maintain that same directory structure on the PXF machine, then edit the PXF configuration file /etc/gphd/pxf/conf/pxf-public.classpath to add the line:
    /usr/lib/gphd/gfxd/lib/gemfirexd.jar
  4. On the Pivotal HD Admin node, execute the commands:
    $ icm_client reconfigure -p -s -l mycluster -c conf/
    $ icm_client start -l mycluster
    Again, substitute mycluster with the name of your cluster, and conf with the name of the destination directory (for example, /etc/gphd/pxf/conf).
  5. See also "Installing PXF" in the Pivotal Extension Framework Installation and Administrator Guide for general requirements when using a PXF plug-in. For example, you must ensure that the Hadoop NameNode and all data nodes that store the HDFS table data use the REST service.

Mapping HDFS Tables in HAWQ

To access and HDFS-persistent table with HAWQ, use the following CREATE EXTERNAL TABLE syntax from a HAWQ interactive command prompt such as psql:
CREATE EXTERNAL TABLE hawq_tablename ( column_list )
   LOCATION ('pxf://namenode_rest_host:port/hdfsstore_homedir/schema.table?PROFILE=GemFireXD[&attribute=value]* ')  
   FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
In this statement:
  • hawq_tablename is the name that you assign to the external table mapping. You will use this table name in HAWQ to query the table data.
  • column_list is the list of the column names and data types that you want to map from the GemFire XD table into HAWQ. Any columns that you specify must use the same column name that you specified in the GemFire XD CREATE TABLE statement, and a data type that is compatible with the PXF Driver. The PXF Driver supports only the datatype mappings described in this table:
    Table 3. PXF Driver Data Type Compatibility
    GemFire XD Data Type Matching PXF Driver Data Type
    SMALLINT SMALLINT
    INTEGER INTEGER
    BIGINT BIGINT
    REAL REAL
    DOUBLE FLOAT8
    VARCHAR VARCHAR or BPCHAR or TEXT
    BOOLEAN BOOLEAN
    NUMERIC NUMERIC
    TIMESTAMP TIMESTAMP
    BINARY BYTEA
    Note: You cannot map a GemFire XD table column that uses any other data type, including the CHAR data type. VARCHAR columns are supported.
    Note: The PXF external table definition does not support column constraint definitions such as NOT NULL or PRIMARY KEY. Do not include these constraints in the CREATE EXTERNAL TABLE command even if the original GemFire XD table specifies such constraints. Specify only the GemFire XD column name and PXF driver data type for each column definition.

    You can choose to map only a subset of the columns available in the GemFire XD table. As a best practice, map only the columns that you intend to use, rather than selecting a subset of columns in the HAWQ external table. The list of columns in the table mapping determines the amount of data transmitted to HAWQ during a query, regardless of the subset of columns you specify in the HAWQ query.

    In the column list, you can add the special column name GFXD_PXF_TS with the timestamp type, to include the GemFire XD timestamps that are associated with each table entry.

  • namenode_rest_host and port specify the host and port number of the REST service running on the Hadoop NameNode that stores the HDFS table log files. The default REST port is 50070.
  • hdfsstore_homedir is the home directory of the GemFire XD HDFS store that holds the table's log files. This corresponds to the HomeDir value that you specified in the CREATE HDFSSTORE statement.
  • schema.table is the full schema and table name of the table as it exists in GemFire XD (the table name that you supplied in the CREATE TABLE statement).
  • [&attribute=value]* represents one or more optional attribute definitions used to configure which HDFS log files the PXF Driver accesses for queries against the external table. See Table 4. By default, HAWQ processes either the full set of raw HDFS log files for an HDFS write-only table, or processes only the most recent snapshot of table values for an HDFS read/write table.

Optional PXF Driver Attributes

The PXF:// URL that you specify can also include the following, optional attributes. Note that if you do not include any options, the default PXF table mapping is equivalent to setting &CHECKPOINT=true for HDFS read/write tables.

Table 4. PXF Configuration Attributes
Attribute Compatible GemFire XD Tables Description
STARTTIME=yyyy.mm.dd-hh:mm:ss | yyyy.mm.dd-hh:mm:ssz Write-Only, Read/Write Identifies the earliest timestamp for table events to process in the HAWQ table. You can specify this attribute by itself to process all events that occurred after a specific time, or you can combine it with ENDTIME to process a range of events.

You can specify the timestamp in yyyy.mm.dd-hh:mm:ss or yyyy.mm.dd-hh:mm:ssz format (year.month.day-hour:minutes:seconds). The optional z specifies the timezone code to use. Greenwich Mean Time (GMT) is used by default if you do not specify a timezone.

If you also configure the CHECKPOINT attribute, then the STARTTIME value is ignored.

ENDTIME=yyyy.mm.dd-hh:mm:ss | yyyy.mm.dd-hh:mm:ssz Write-Only, Read/Write Identifies the latest timestamp for table events to process in the HAWQ table. You can specify this attribute by itself to process all events that occurred before a specific time, or you can combine it with STARTTIME to process a range of events.

ENDTIME uses the same timestamp format as the STARTTIME attribute described above.

If you also configure the CHECKPOINT attribute, then the ENDTIME value is ignored.

CHECKPOINT=[ true | false ] Read/Write This property is only available for HDFS read/write tables. When you specify the CHECKPOINT=true property, the HAWQ table only processes the latest table row values, instead of table events that occurred over time.

CHECKPOINT=true is used by default if you specify no other attributes. Always specify CHECKPOINT=false if you want to process table event data in the external table.

Keep in mind that DML operations to GemFire XD tables are not immediately available in HDFS log files, and are therefore not immediately visible in HAWQ external tables. See Understanding When Data is Available in HDFS for more information.

Example Table Mappings

Consider the following example HDFS table, created in GemFire XD using the CREATE HDFSSTORE and CREATE TABLE commands:
gfxd> CREATE HDFSSTORE mystreamingstore  NAMENODE 'hdfs://pivhdsne:8020'  HOMEDIR 'streamstore';

gfxd> CREATE TABLE flightstream (mon_id INT NOT NULL PRIMARY KEY, mon_time TIMESTAMP NOT NULL, mon_data VARCHAR(2000))
PARTITION BY COLUMN (id) HDFSSTORE (mystreamingstore);
The following statement, executed in the HAWQ psql interactive prompt, creates an external table that uses all raw HDFS log files that were created after the specified STARTTIME. Note that it maps the special GFXD_PXF_TS column to include timestamp information for each table entry; GFXD_PXF_TS is not a column available in the original GemFire XD table. In GemFire XD, the table was created in the default APP schema, which is specified in the qualified table name after the HDFS store home directory::
gpadmin=# CREATE EXTERNAL TABLE hawq_flightstream ( mon_id INT,
          mon_time TIMESTAMP,
          mon_data VARCHAR(2000)
          GFXD_PXF_TS timestamp )
          LOCATION ('pxf://pivhdsne:50070/streamstore/app.flightstream?PROFILE=GemFireXD&STARTTIME=2013.11.01-12:00:00&CHECKPOINT=false')  
          FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
The following statement creates an external table that uses only the latest, compacted HDFS log files for an HDFS read/write table:
gpadmin=# CREATE EXTERNAL TABLE hawq_flightstream_current ( big_id INT,
          big_time TIMESTAMP,
          big_data VARCHAR(2000) )
          LOCATION ('pxf://pivhdsne:50070/streamstore/app.flightstream?PROFILE=GemFireXD&CHECKPOINT=true')  
          FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');

Using Hive to Access HDFS Table Data

Hive uses a SQL-like language, called HiveQL, to query and manipulate HDFS data. GemFire XD provides a Hive InputFormat implementation, which enables HiveQL to scan HDFS table data using a Hive external table.

Requirements and Limitations

  • The GFXDHiveInputFormat implementation is supported only with the version of Hive included in Pivotal HD.
  • In order to access GemFire XD HDFS table data, Hive must include the gemfirexd.jar library in its classpath. If you did not install GemFire XD and Hive on the same machine, copy gemfirexd.jar to your Hive machine. This file is installed in the /lib subdirectory of the GemFire XD installation directory:
    • For RPM installations, the file is installed to /opt/lib/gpdb/gfxd/lib/gemfirexd.jar.
    • For ZIP file installations, the file is installed to Pivotal_GemFireXD_13_bNNNN_platform/lib/gemfirexd.jar in the directory where you unzipped the file.
  • You cannot execute a join between two or more Hive external tables that point to GemFire XD table data.
  • GemFire XD does not provide a Hive Serializer/Deserializer (SerDe) implementation. You must create and configure a custom SerDe in order to convert GemFire XD data types to Hive rows as necessary.
  • GemFire XD does not provide a Hive OutputFormatter implementation, and it is not possible to write records back into GemFire XD using Hive.
  • The GFXDHiveInputFormat implementation does not read Hive properties that are supplied in the Hive CREATE EXTERNAL TABLE statement. In order to specify the location of the table data

Creating a Custom SerDe

The GFXDHiveInputFormat implementation converts GemFire XD HDFS table data into key, result set pairs. You must create a custom SerDe in order to deserialize each result into a Hive row object.

Keep in mind that there are two special, hidden columns in GemFire XD HDFS table rows: timestamp and eventtype. The timestamp column value specifies the time that the event occurred. The eventtype column specifies the type of DML operation, and can be used to filter out deleted records. You can add these columns as Hive data in the SerDe as necessary, and then map the columns to a Hive external table for querying.

The following SerDe implementation is intended to work with a GemFire XD HDFS table that was created using DDL similar to:
gfxd> create table stock_events  ( symbol VARCHAR ( 10 ),  priceindollars  int ,  dowjones  int )  persistent hdfsstore  ( myhdfs );
This excerpt from the SerDe code fetches the column names and types from the SerDe properties, and creates Hive row TypeInfo and ObjectInspector objects. The final line fetches an example SerDe property to convert from dollar values to rupees. You can define the SerDe properties when you configure the external table with the SerDe implementation (see Mapping HDFS Tables in Hive for an example):
  @Override
  public void initialize(Configuration conf, Properties tbl)
          throws SerDeException {
    
    
        // Get column names and types
        String colNamesList = tbl.getProperty(Constants.LIST_COLUMNS);
        String colTypeList = tbl.getProperty(Constants.LIST_COLUMN_TYPES);
    
        // all table column names
        if (colNamesList.length() == 0) {
            columnNames = new ArrayList<String>();
        } else {
            columnNames = Arrays.asList(colNamesList.split(","));
            }
   
        // all column types
        if (colTypeList.length() == 0) {
            columnTypes = new ArrayList<TypeInfo>();
        } else {
            columnTypes = TypeInfoUtils.getTypeInfosFromTypeString(colTypeList);
        }
        assert (columnNames.size() == columnTypes.size());
    
   // Create row related objects
   hiveRowTI = (StructTypeInfo) TypeInfoFactory.getStructTypeInfo(columnNames, columnTypes);
   hiveRowOI = (StructObjectInspector) TypeInfoUtils.getStandardJavaObjectInspectorFromTypeInfo(hiveRowTI);;
   
   // Fetch the SerDe property of dollar-to-rupee conversion
   dollarToRupee = Integer.parseInt(tbl.getProperty("dollar.to.rupee"));
   
  }
The following SerDe function converts a GemFire XD row to a Hive Row. Note that this example shows the handling of the hidden timestamp and eventtype rows, as well as table rows specified in the DDL statement:
  @Override
  public Object deserialize(Writable writable) throws SerDeException {
        try {
          hiveRow.clear();
          Row gfxdRow = (Row)writable;
          ResultSet gfxdResultSet = null;
            
          gfxdResultSet = gfxdRow.getRowAsResultSet();
          
          // Iterate over the Gemfire XD row and read the columns into Hive row.
          
          // Timestamp is a special field in GemFire XD row that
          // can be used by hive to query records created between a time range.
          // EventType is another special field that can be used by Hive to filter out
          // deleted records.
          // These special fields need to be part of Hive schema.
          for (int i = 0; i < columnNames.size(); i++) {
            
            // read the timestamp and the event type from gfxdRow
            if (columnNames.get(i).equals("timestamp")){
              hiveRow.add(gfxdRow.getTimestamp());
            }
            else if (columnNames.get(i).equals("eventtype")){
              hiveRow.add(gfxdRow.getEventType().ordinal());
            }
            else if (columnNames.get(i).equals("priceinrupees")){
              // map the column price in dollars to price in rupees
              hiveRow.add(gfxdResultSet.getInt("priceindollars") * dollarToRupee);
            }
            else if (columnTypes.get(i).getTypeName() == Constants.INT_TYPE_NAME) {
              hiveRow.add(gfxdResultSet.getInt(columnNames.get(i)));
            }
            else if (columnTypes.get(i).getTypeName() == Constants.STRING_TYPE_NAME) {
              hiveRow.add(gfxdResultSet.getString(columnNames.get(i)));
            }
            else {
              throw new SerDeException("As already said, only integer and string types are supported in my beautiful world");
            }
          }
        } catch (SQLException e) {
          throw new SerDeException("Failed while reading from GemFire XD row ", e);
        } catch (IOException e) {
          throw new SerDeException("Failed while reading from GemFire XD row ", e);
        }
        return hiveRow;
  }

Configuring Hive

You must include the gemfirexd.jar file in the Hive classpath, along with the JAR containing your compiled, custom SerDe implementation. You can add JARs to the classpath either at the command line when you start Hive, or you can add them to hive-site.xml file.

In addition to adding the required JAR files, you must specify HiveInputFormat as the default input format configuration. If you do not include this configuration then Hive uses CombineHiveInputFormat by default, which does not perform splits on the results provided by GFXDHiveInputFormat.

To configure Hive at the command line, specify the required JARs and input formatter in the highlighted command-line options:
$ hive -hiveconf hive.aux.jars.path=/.../gemfirexd.jar,/.../customserde.jar \
-hiveconf hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat
To add the configuration to hive-site.xml, include these property definitions:
 <property>
   <name>hive.aux.jars.path</name>
   <value>/.../gemfirexd.jar,/.../customserde.jar</value>
   <description>A comma separated list (with no spaces) of the jar files</description>
 </property>
 <property>
   <name>hive.input.format</name>
   <value>org.apache.hadoop.hive.ql.io.HiveInputFormat</value>
 </property>

Mapping HDFS Tables in Hive

After you define the mapping between GemFire XD table columns and Hive columns in your SerDe implementation, you can create the Hive external table to use for querying the HDFS table data. In addition to defining one or more columns of the GemFire XD table, the external table definition specifies the GFXDHiveInputFormat implementation and your custom SerDe implementation. You can optionally specify SerDe properties to map column values to new values as necessary.

Hive external table columns can reference a subset of GemFire XD table columns that are necessary for querying or the full column list, including the special columns, timestamp and eventtype.

To define an external table, use the syntax:
hive> CREATE EXTERNAL TABLE hive-tablename ( column-list )
         ROW FORMAT SERDE 'custom-serde-class'  [ WITH SERDEPROPERTIES ( 'property-name' = 'property-value' ) ]
         STORED AS INPUTFORMAT 'com.pivotal.gemfirexd.hadoop.mapred.hive.GFXDHiveInputFormat'  
         LOCATION 'gfxd-table-hdfs-path' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; 
Note: Hive requires that you specify an OUTPUTFORMAT, even though GemFire XD does not provide an output formatter for writing values back to GemFire XD. Specify a dummy value for the OUTPUTFORMAT option.
Note: Hive requires that you specify LOCATION, but the GFXDHiveInputFormat does not read this value. You must specify the actual GemFire XD HDFS table location using configuration properties before querying the table. See Querying Hive Tables.

For example, the following external table definition maps a selection of columns from the example GemFire XD table definition. A SerDe property is provided to specify a value for mapping the value of dollars to rupees (see the example SerDe implementation in Creating a Custom SerDe):

hive> CREATE EXTERNAL TABLE stock_events_us (symbol STRING, priceinrupees int, eventtype int, timestamp bigint)
         ROW FORMAT SERDE 'com.pivotal.gemfirexd.hadoop.mapreduce.GFXDSerDe'  WITH SERDEPROPERTIES ('dollar.to.rupee' = '60')
         STORED AS INPUTFORMAT 'com.pivotal.gemfirexd.hadoop.mapred.hive.GFXDHiveInputFormat'  
         LOCATION '/user/hemantb/myhdfs' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';

Querying Hive Tables

Each time you execute a query against a Hive external table, Hive launches a MapReduce job using the GemFire XD input formatter and SerDe from that you provided in the external table configuration. The GFXDHiveInputFormat implementation also requires the name of the GemFire XD table to query and the location of the table's HDFS persistence files, which you must provide as configuration properties before executing the query. Other configuration properties may also be provided to define the set of HDFS persistence files that are used to satisfy queries against the table. The list of required and optional configuration properties is provided below.
Table 5. GFXDHiveInputFormat Configuration Properties
Property Compatible GemFire XD Tables Description
gfxd.input.tablename=table-name Required for all table types. (Required.) Specifies the name of the GemFire XD table to query.
gfxd.input.homedir=gfxd-table-hdfs-path Required for all table types. (Required.) Specifies the HDFS path to the table's HDFS persistence files. See CREATE HDFSSTORE.
gfxd.input.starttimemillis=time Write-Only, Read/Write Identifies the earliest time for table events to process in the Hive table. You can specify this attribute by itself to process all events that occurred after a specific time, or you can combine it with gfxd.input.endtimemillis to process a range of events.

If you also configure the gfxd.input.checkpointmode property, then the gfxd.input.starttimemillis value is ignored.

gfxd.input.endtimemillis=time Write-Only, Read/Write Identifies the latest timestamp for table events to process in the Hive table. You can specify this attribute by itself to process all events that occurred before a specific time, or you can combine it with gfxd.input.starttimemillis to process a range of events.

If you also configure the gfxd.input.checkpointmode property, then the gfxd.input.endtimemillis value is ignored.

gfxd.input.checkpointmode=[ true | false ] Read/Write This property is only available for HDFS read/write tables. When you specify the gfxd.input.checkpointmode=true property, the Hive table only processes the latest table row values, instead of table events that occurred over time.

gfxd.input.checkpointmode=true is used by default if you specify no other properties. Always specify gfxd.input.checkpointmode=false if you want to process table event data in the external table.

Note: You can specify only a single home directory and a single table name for a given query. For this reason, you cannot create a join query between two or more Hive external tables that point to GemFire XD table data.
The following example shows how to set these configuration properties before running a Hive query. The commands reference the example external table definition and SerDe described in the previous sections:
// Set required properties to identify the input table.
hive>SET gfxd.input.homedir=/user/hemantb/myhdfs;
hive>SET gfxd.input.tablename=STOCK_EVENTS;

// Set GFXDHiveInputFormat property to fetch available event data.
hive>SET gfxd.input.checkpointmode=false;

// Process events that were created in a particular range.
hive>SET gfxd.input.endtimemillis=1399367981616;    
hive>SET gfxd.input.starttimemillis=1399367980616;

// Return some events that happened at a specific point in time. 
hive>  select symbol, priceinrupees  from stock_events_us where timestamp = 1399367981220;

// Find the average stock price of all the companies during this range of time.
hive> SELECT symbol,  avg(priceinrupees) FROM   stock_events_us GROUP BY symbol;

Backing Up and Restoring HDFS Tables

Persistent HDFS tables use both local operational log files and HDFS log files to store data. To back-up an HDFS table in its entirety, you must use separate commands to backup both operational and HFDS log files.

The backup command included with GemFire XD backs up only the local, operational log files that are created on GemFire XD members for persistent tables. Pivotal recommends using an HDFS tool such as distcp to backup the HDFS log files. Follow the steps below to ensure that you create a complete, valid backup of an HDFS table:
  1. Ensure that the HDFS table is created using the PERSISTENT keyword. Non-persistent tables do not store their operational data in local log files, and cannot be backed up.
  2. Do not perform any DDL operations while performing the remaining steps.
  3. Do not perform any DML operations while performing the remaining steps.
    Note: Because you backup an HDFS table in two parts, DML operations that occur during the backup can cause HDFS log file backups to have more recent table data than the operational log file backups. This can lead to table inconsistencies when you restore the log files from backup.
  4. Use backup to back up the table's operational log files. See for more information.
  5. Use distcp or another utility to backup all of the table's operational log files.
  6. Make note of the time and location of both sets of backups. You will need to use the same pair of log backups to restore the table at a later time.

Restoring an HDFS Table from Backup

In order to restore an HDFS table from backup, you must use the operational log file backup and HDFS log file backup that were created during the same backup procedure. Never restore HDFS and operational log backups that were created during different periods of time. If the operational data contains a table schema that is different from that of the HDFS data, then tables can become unreadable. If the schema in both backups is the same but the data comes from different periods of time, then inconsistent results can be returned when using the query-HDFS property to query operational versus HDFS data.

Follow these steps to restore an HDFS table:
  1. Shut down all members of the GemFire XD distributed system.
  2. Restore the HDFS log files to the same location, overwriting any existing files for the table. Refer to the documentation for your Hadoop backup utility for more information.
  3. Restore the local operational log files on GemFire XD members. See Backing Up and Restoring Disk Stores for more information.
  4. Restart the GemFire XD distributed system.

Best Practices for Achieving High Performance

When using GemFire XD HDFS-persistent tables, follow these recommendations to ensure to ensure the best possible performance of your system.

  • Colocate GemFire XD servers with HDFS data nodes. GemFire XD uses an asynchronous event queue to flush data from GemFire XD tables to HDFS. Colocating GemFire XD data stores with HDFS data nodes minimizes the network load and also enables certain optimizations when reading data from HDFS (for HDFS read/write tables).
  • Start GemFire XD servers with lock-memory=true to lock JVM heap and offheap pages into RAM. Locking memory prevents severe performance degradations caused HDFS disk buffering. Drop operating system caches before starting members with locked memory. If you deploy more than one server per host, begin by starting each server sequentially with lock-memory=true. Starting servers sequentially avoids a race condition in the operating system that can cause failures (even machine crashes) if you accidentally over-allocate the available RAM. After you verify that the system configuration is stable, you can then start servers concurrently.
  • Tune HDFS queues to avoid overflowing to disk. Each set of colocated HDFS tables uses a single asynchronous HDFS queue to process events. Applications with high throughput can overwhelm the queue, causing it to overflow events to disk and degrade performance. To avoid queue overflow:
    • Use the DispatcherThreads option to increase the number of threads used to write batches from the queue to HDFS log files. See CREATE HDFSSTORE.
    • Configure a maximum queue size (MaxQueueMemory option) that is high enough to handle your application load. See CREATE HDFSSTORE.
    • If the capabilities of a single queue are insufficient, consider adding a second GemFire XD servers per host, which adds a new queue and dispatcher thread for processing. If you configure multiple servers in this way, also remember to reconfiguring the server heap space, offheap space, and so forth to account for the additional servers. Keep in mind, however, that the performance of any queries that do not scale may be negatively impacted by adding additional servers. Profile your application as a whole to determine whether the benefits of adding an additional server outweigh the tradeoff in query performance.
  • Tune the critical heap percentage to avoid overflowing table data to disk. The GemFire XD resource manager automatically overflows table data to disk in order to prevent out of memory conditions. To avoid this, tune the JVM and the GemFire XD critical heap percentage so that the eviction heap percentage is never exceeded by the tenured heap usage. See Procedures.

    Keep in mind that as you increase the size of the HDFS asynchronous event queue, the queues are flushed less frequently and queue entries are more likely to be promoted to the tenured generation. To avoid crossing the eviction threshold due to HDFS queue-related garbage, tune the JVM to increase the young generation size and set the CMSInitiatingOccupancyFraction to keep any tenured garbage from accumulating for too long.

  • Use separate disk stores (on separate physical disks) for HDFS queue persistence and table persistence/overflow. If you do not specify a named disk store when creating an HDFS store or creating a persistent table, then GemFire XD uses the default disk store.
  • Use 10-gigabit Ethernet or better, and consider using channel bonding to increase network bandwidth. GemFire XD deployments that use partitioned tables, redundancy, and HDFS persistence may write to several different hosts with each update. Reads can also easily saturate a 1-gigabit NIC.
  • Run with compression where possible to reduce network traffic and improve throughput.
  • Increase member-timeout to prevent forced disconnects. If you run multiple servers per host, conditions such as slow networks, high-throughput applications, heavy CPU issues, or network congestion can delay GemFire XD peer message and cause servers to time out and disconnect from the distributed system. If you experience this problem, increase the member-timeout value.
  • For HDFS read/write tables, avoid queries that do not use a primary key value. Such queries require a full table scan. See Querying and Updating HDFS Read/Write Tables.