CREATE HDFSSTORE

Creates a connection to a Hadoop name node in order to persist one or more tables to HDFS. Each connection defines the HDFS NameNode and directory to use for persisting data, as well as GemFire XD-specific options to configure the queue used to persist table events, enable persistence for the connection, compact the HDFS operational logs, and so forth.

Syntax

CREATE HDFSSTORE store-name
  NAMENODE 'url'
  [ HomeDir 'directory-name' ]

  [ BatchSize integer-constant ]
  [ MaxQueueMemory integer-constant ]
 
  [ QueuePersistent boolean-constant ] 
  [ DiskSynchronous boolean-constant ]
  [ DiskStoreName store-name ]

  [ MinorCompact boolean-constant ]
  [ MaxInputFileSize integer-constant ]
  [ MinInputFileCount integer-constant ]
  [ MaxInputFileCount integer-constant ]
  [ MinorCompactionThreads integer-constant ]
 
  [ MajorCompact boolean-constant ]
  [ MajorCompactionThreads integer-constant ]

  [ MaxWriteOnlyFileSize integer-constant ]

  [ ClientConfigFile 'file-name' ]
Note: An HDFSSTORE can persist table data using either the HDFS write-only model or the HDFS read/write model; you specify the HDFS persistence model using the CREATE TABLE statement. Although multiple tables can use the same HDFSSTORE for persistence, you will generally need to create multiple HDFSSTORE configurations to modify the queue and compaction behavior for each table.
store-name
(Required.) A unique identifier for the HDFS store configuration.
NAMENODE
(Required.) The URL of the Hadoop NameNode for your Pivotal HD cluster (for example, hdfs://server-name:8020).
HomeDir
The Hadoop directory in which GemFire XD stores GemFire XD table persistence files for this store. The value must not contain the Hadoop NameNode URL. The owner of the GemFire XD JVM process must have read and write access to this directory in Hadoop.

If you provide a directory name or relative path for HomeDir, then GemFire XD creates the directory in HDFS relative to the Hadoop root directory specified by the hdfs-root-dir property. (If you do not specify a value for hdfs-root-dir, then the Hadoop root directory is /user/user_name, where user-name is the process owner for GemFire XD.)

If you omit the HomeDir option from the CREATE HDFSSTORE statement, then by default GemFire XD creates a directory of the same name as the HDFS store (store-name) in the HDFS root directory. If you omit HomeDir and use the default hdfs-root-dir property, this corresponds to /user/user_name/store-name.

As a best practice, always create HDFS store directories relative to a single HDFS root directory. As an alternative, you can specify an absolute path beginning with the "/" character to override the default root location.

BatchSize
The maximum size (in megabytes) of each batch that is written to the Hadoop directory. The default size is 32 MB.
MaxQueueMemory
The maximum amount of memory in megabytes that the queue can consume before overflowing to disk. The default is 100 MB.
QueuePersistent
Include this option to persist the event queue that GemFire XD uses to send table data to HDFS. (The queue is persisted to a local GemFire XD disk store.) By default an HDFS store queue is not persistent.
DiskSynchronous
If you enable persistence with QueuePersistent, you can include the DiskSynchronous option to enable or disable asynchronous writes to the local GemFire XD disk store. Specify FALSE to enable asynchronous writes to the disk store. By default (TRUE) a persistent event queue performs synchronous writes to the local GemFire XD disk store.
DiskStoreName
The named disk store to use for storing the queue overflow, or for persisting the queue (if QueuePersistent is specified). If you specify a value, the named disk store must exist. If you specify a null value or you omit this option, GemFire XD uses the default disk store for overflow and queue persistence.
MinorCompact
Specify TRUE to enable automatic minor compaction for the HDFS read/write log files. Minor compaction reduces the number of files in HDFS in order to avoid performance degradation in HDFS and the GemFire XD cluster.
Note: Do not disable minor compaction unless you tune other HDFS parameters to avoid severe performance degradation. Turning off minor compaction can cause a very large number of HDFS log files to be created, which can potentially exhaust HDFS receiver threads and/or client sockets. To offset these problems, increase the BatchSize option to create a fewer number of HDFS log files. As a best practice, leave minor compaction enabled unless compaction causes excessive I/O overhead in HDFS that cannot be resolved by tuning compaction behavior.
MaxInputFileSize
The maximum size of a file (in megabytes) that GemFire XD will consider for minor compaction cycles. Files larger than this value are only affected during major compaction. The default is 512 MB.
MinInputFileCount
The minimum number of input files per bucket that can be created before GemFire XD begins to automatically compact HDSF log files. GemFire XD performs no minor compaction until this number of files have been created for a given bucket, after which files that are smaller than MAXINPUTFILESIZE may be compacted. The default is 4.
Note: Use caution when increasing the MinInputFileCount value, as it applies to each bucket persisted by the HDFS store, rather than to the HDFS store as a whole. As more tables target the HDFS store, additional HDFS file handles are required to manage the number of open files. A large number of buckets combined with a high MinInputFileCount can result in thousands of files opened in HDFS. Ensure that you have configured your operating system to support large numbers of file descriptors, as described in Supported Configurations and System Requirements.
MaxInputFileCount
The maximum number of input files per bucket to include in a minor compaction cycle. The default is 10.
MinorCompactionThreads
The maximum number of threads that GemFire XD uses to perform minor compaction in this HDFS store. Within a given bucket, only one compaction cycle (minor or major) can run at a given time. You can increase the number of threads used for compactions on different buckets as necessary in order to fully utilize the performance of your HDFS cluster and its disks. By default GemFire XD uses 10 threads for minor compaction and 2 threads for major compaction.
MajorCompact
Specify TRUE to enable automatic major compaction for the HDFS read/write log files. Major compaction removes deleted events from the HDFS log files, which can save space in HDFS and improve performance when reading from HDFS log files. GemFire XD performs major compaction by default. As major compaction process can be long-running and I/O-intensive, tune the performance of major compaction using MajorCompactionThreads.
MajorCompactionThreads
The maximum number of threads that GemFire XD uses to perform major compaction in this HDFS store. Within a given bucket, only one compaction cycle (minor or major) can run at a given time. You can increase the number of threads used for compactions on different buckets as necessary in order to fully utilize the performance of your HDFS cluster and its disks. By default GemFire XD uses 10 threads for minor compaction and 2 threads for major compaction.
MaxWriteOnlyFileSize
For HDFS write-only tables, this defines the maximum size (in megabytes) that an HDFS log file can reach before GemFire XD closes the file and begins writing to a new file. This clause is ignored for HDFS read/write tables. Keep in mind that the operational logs files are not available for MapReduce processing until the file is closed. The default is 256 MB.
ClientConfigFile
The full path to the HDFS client configuration file that the store uses.

Example

Create a persistent connection to a Hadoop directory, storing HDFS log files in the hdfsstore1 subdirectory of the root directory defined by hdfs-root-dir. The HDFS event queue is also persisted using the default GemFire XD disk store:

CREATE HDFSSTORE hdfsstore1
  NAMENODE 'hdfs://gfxd1:8020'
  QUEUEPERSISTENT true;
Store HDFS log files in the stream-tables directory under the HDFS root directory defined by hdfs-root-dir. This command configures an HDFS queue where data is written to the HDFS store in batches as large as 10 megabytes:
CREATE HDFSSTORE streamingstore
  NAMENODE 'hdfs://gfxd1:8020'
  HOMEDIR 'stream-tables' 
  BATCHSIZE 10
  QUEUEPERSISTENT true;
Configure an HDFSSTORE with compaction settings for HDFS read/write persistence. Here, GemFire XD performs both minor and major compaction on the resulting HDFS store persistence files. Minor compaction is performed on files up to 12 MB in size, and can involve as many as 8 files at a time. Any files larger than 12 MB are compacted during the major compaction cycle. A maximum of 3 threads are used in either compaction cycle:
CREATE HDFSSTORE readwritestore
  NAMENODE 'hdfs://gfxd1:8020'
  BATCHSIZE 10
  QUEUEPERSISTENT true
  MINORCOMPACT true
  MAJORCOMPACT true
  MAXINPUTFILESIZE 12
  MININPUTFILECOUNT 1
  MAXINPUTFILECOUNT 8
  MINORCOMPACTIONTHREADS 3
  MAJORCOMPACTIONTHREADS 3;