Creates a connection to a Hadoop name node in order to persist one or more tables to
HDFS. Each connection defines the HDFS NameNode and directory to use for persisting data, as
well as GemFire XD-specific options to configure the queue used to persist table events,
enable persistence for the connection, compact the HDFS operational logs, and so
CREATE HDFSSTORE store-name
[ HomeDir 'directory-name' ]
[ BatchSize integer-constant ]
[ MaxQueueMemory integer-constant ]
[ QueuePersistent boolean-constant ]
[ DiskSynchronous boolean-constant ]
[ DiskStoreName store-name ]
[ MinorCompact boolean-constant ]
[ MaxInputFileSize integer-constant ]
[ MinInputFileCount integer-constant ]
[ MaxInputFileCount integer-constant ]
[ MinorCompactionThreads integer-constant ]
[ MajorCompact boolean-constant ]
[ MajorCompactionThreads integer-constant ]
[ MaxWriteOnlyFileSize integer-constant ]
[ ClientConfigFile 'file-name' ]
An HDFSSTORE can persist table data using either the HDFS write-only model or the HDFS
read/write model; you specify the HDFS persistence model using the CREATE TABLE
statement. Although multiple tables can use the same HDFSSTORE for persistence, you
will generally need to create multiple HDFSSTORE configurations to modify the queue
and compaction behavior for each table.
- (Required.) A unique identifier for the HDFS store configuration.
- (Required.) The URL of the Hadoop NameNode for your Pivotal HD cluster (for example,
- The Hadoop directory in which GemFire XD stores GemFire XD table persistence files for this
store. The value must not contain the Hadoop NameNode URL. The owner of the
GemFire XD JVM process must have read and write access to this directory in
If you provide a directory name or relative path for
HomeDir, then GemFire XD creates the directory in
HDFS relative to the Hadoop root directory specified by the hdfs-root-dir property. (If you do not specify a
value for hdfs-root-dir, then the Hadoop root directory
is /user/user_name, where
user-name is the process owner for GemFire
If you omit the HomeDir option from the
CREATE HDFSSTORE statement, then by default GemFire XD creates a
directory of the same name as the HDFS store
(store-name) in the HDFS root directory. If you
omit HomeDir and use the default hdfs-root-dir property, this corresponds to
a best practice, always create HDFS store directories relative to a
single HDFS root directory. As an alternative, you can specify an
absolute path beginning with the "/" character to override the default
- The maximum size (in megabytes) of each batch that is written to the Hadoop directory. The
default size is 32 MB.
- The maximum amount of memory in megabytes that the queue can consume before overflowing to
disk. The default is 100 MB.
- Include this option to persist the event queue that GemFire XD uses to send table data to
HDFS. (The queue is persisted to a local GemFire XD disk store.) By default
an HDFS store queue is not persistent.
- If you enable persistence with QueuePersistent, you can include the
DiskSynchronous option to enable or disable
asynchronous writes to the local GemFire XD disk store. Specify FALSE to
enable asynchronous writes to the disk store. By default (TRUE) a persistent
event queue performs synchronous writes to the local GemFire XD disk
- The named disk store to use for storing the queue overflow, or for persisting the queue (if
QueuePersistent is specified). If you specify a value,
the named disk store must exist. If you specify a null value or you omit
this option, GemFire XD uses the default disk store for overflow and queue
- Specify TRUE to enable automatic minor compaction for the HDFS read/write log files. Minor
compaction reduces the number of files in HDFS in order to avoid performance
degradation in HDFS and the GemFire XD cluster.
Note: Do not disable minor
compaction unless you tune other HDFS parameters to avoid severe
performance degradation. Turning off minor compaction can cause a very
large number of HDFS log files to be created, which can potentially
exhaust HDFS receiver threads and/or client sockets. To offset these
problems, increase the BatchSize option to create a
fewer number of HDFS log files. As a best practice, leave minor
compaction enabled unless compaction causes excessive I/O overhead in
HDFS that cannot be resolved by tuning compaction behavior.
- The maximum size of a file (in megabytes) that GemFire XD will consider for minor
compaction cycles. Files larger than this value are only affected during
major compaction. The default is 512 MB.
- The minimum number of input files per bucket that can be created before GemFire XD begins
to automatically compact HDSF log files. GemFire XD performs no minor
compaction until this number of files have been created for a given bucket,
after which files that are smaller than MAXINPUTFILESIZE may be compacted.
The default is 4.
Use caution when increasing the
value, as it applies to each
bucket persisted by the HDFS store, rather than to the HDFS store as a
whole. As more tables target the HDFS store, additional HDFS file
handles are required to manage the number of open files. A large number
of buckets combined with a high MinInputFileCount
result in thousands of files opened in HDFS. Ensure that you have
configured your operating system to support large numbers of file
descriptors, as described in Supported Configurations and System Requirements
- The maximum number of input files per bucket to include in a minor compaction cycle. The
default is 10.
- The maximum number of threads that GemFire XD uses to perform minor compaction in this HDFS
store. Within a given bucket, only one compaction cycle (minor or major) can
run at a given time. You can increase the number of threads used for
compactions on different buckets as necessary in order to fully utilize the
performance of your HDFS cluster and its disks. By default GemFire XD uses
10 threads for minor compaction and 2 threads for major compaction.
- Specify TRUE to enable automatic major compaction for the HDFS read/write log files. Major
compaction removes deleted events from the HDFS log files, which can save
space in HDFS and improve performance when reading from HDFS log files.
GemFire XD performs major compaction by default. As major compaction process
can be long-running and I/O-intensive, tune the performance of major
compaction using MajorCompactionThreads.
- The maximum number of threads that GemFire XD uses to perform major
compaction in this HDFS store. Within a given bucket, only one compaction
cycle (minor or major) can run at a given time. You can increase the number
of threads used for compactions on different buckets as necessary in order
to fully utilize the performance of your HDFS cluster and its disks. By
default GemFire XD uses 10 threads for minor compaction and 2 threads for
- For HDFS write-only tables, this defines the maximum size (in megabytes) that an HDFS log
file can reach before GemFire XD closes the file and begins writing to a new
file. This clause is ignored for HDFS read/write tables. Keep in mind that
the operational logs files are not available for MapReduce processing until
the file is closed. The default is 256 MB.
- The full path to the HDFS client configuration file that the store uses.
Create a persistent connection to a Hadoop directory, storing HDFS log files in the
hdfsstore1 subdirectory of the root directory defined by
hdfs-root-dir. The HDFS event queue is also persisted using the
default GemFire XD disk store:
CREATE HDFSSTORE hdfsstore1
Store HDFS log files in the stream-tables
directory under the HDFS root
directory defined by hdfs-root-dir
. This command configures an HDFS queue where data
is written to the HDFS store in batches as large as 10
CREATE HDFSSTORE streamingstore
Configure an HDFSSTORE with compaction settings for HDFS read/write persistence. Here, GemFire
XD performs both minor and major compaction on the resulting HDFS store persistence
files. Minor compaction is performed on files up to 12 MB in size, and can involve
as many as 8 files at a time. Any files larger than 12 MB are compacted during the
major compaction cycle. A maximum of 3 threads are used in either compaction
CREATE HDFSSTORE readwritestore