ALTER HDFSSTORE

Changes the configuration of an existing HDFS store. You can use this command to modify properties of the HDFS store queue such as the frequency of writing batches from the queue, log file compaction settings, and log file rollover behavior.

Note: You cannot use this command to modify certain immutable properties of the HDFS store, such as the Hadoop NameNode and HDFS directory, queue persistence settings, or the amount of memory or threads used to process events from the queue. These properties can only be defined when you first create the queue. See CREATE HDFSSTORE.

Syntax

ALTER HDFSSTORE store-name
  [ BatchSize integer-constant ]
  [ BatchTimeInterval integer-constant { MILLISECONDS | SECONDS | MINUTES | HOURS | DAYS } ]
 
  [ DiskStoreName store-name ]

  [ MinorCompact boolean-constant ]
  [ MaxInputFileSize integer-constant ]
  [ MinInputFileCount integer-constant ]
  [ MaxInputFileCount integer-constant ]
  [ MinorCompactionThreads integer-constant ]
 
  [ MajorCompact boolean-constant ]
  [ MajorCompactionInterval integer-constant { MILLISECONDS | SECONDS | MINUTES | HOURS | DAYS } ]
  [ MajorCompactionThreads integer-constant ]

  [ MaxWriteOnlyFileSize integer-constant ]
  [ WriteOnlyFileRolloverInterval  integer-constant { MILLISECONDS | SECONDS | MINUTES | HOURS | DAYS } ]

  [ PurgeInterval integer-constant { MILLISECONDS | SECONDS | MINUTES | HOURS | DAYS } ]
store-name
(Required.) The unique identifier for the existing HDFS store configuration that you are modifying.
BatchSize
The maximum size (in megabytes) of each batch that is written to the Hadoop directory. The default size is 32 MB.
BatchTimeInterval
The maximum time that can elapse between writing batches to HDFS. The default is 60000 milliseconds (1 minute).
DiskStoreName
The named disk store to use for storing the queue overflow, or for persisting the queue (if QueuePersistent is specified). If you specify a value, the named disk store must exist. If you specify a null value or you omit this option, GemFire XD uses the default disk store for overflow and queue persistence.
MinorCompact
Specify TRUE to enable automatic minor compaction for the HDFS read/write log files. Minor compaction reduces the number of files in HDFS in order to avoid performance degradation in HDFS and the GemFire XD cluster.
Note: Do not disable minor compaction unless you tune other HDFS parameters to avoid severe performance degradation. Turning off minor compaction can cause a very large number of HDFS log files to be created, which can potentially exhaust HDFS receiver threads and/or client sockets. To offset these problems, increase the BatchTimeInterval and BatchSize options to create a fewer number of HDFS log files. As a best practice, leave minor compaction enabled unless compaction causes excessive I/O overhead in HDFS that cannot be resolved by tuning compaction behavior.
MaxInputFileSize
The maximum size of a file (in megabytes) that GemFire XD will consider for minor compaction cycles. Files larger than this value are only affected during major compaction. The default is 512 MB.
MinInputFileCount
The minimum number of input files per bucket that can be created before GemFire XD begins to automatically compact HDSF log files. GemFire XD performs no minor compaction until this number of files have been created for a given bucket, after which files that are smaller than MAXINPUTFILESIZE may be compacted. The default is 4.
Note: Use caution when increasing the MinInputFileCount value, as it applies to each bucket persisted by the HDFS store, rather than to the HDFS store as a whole. As more tables target the HDFS store, additional HDFS file handles are required to manage the number of open files. A large number of buckets combined with a high MinInputFileCount can result in thousands of files opened in HDFS. Ensure that you have configured your operating system to support large numbers of file descriptors, as described in Supported Configurations and System Requirements.
MaxInputFileCount
The maximum number of input files per bucket to include in a minor compaction cycle. The default is 10.
MinorCompactionThreads
The maximum number of threads that GemFire XD uses to perform minor compaction in this HDFS store. Within a given bucket, only one compaction cycle (minor or major) can run at a given time. You can increase the number of threads used for compactions on different buckets as necessary in order to fully utilize the performance of your HDFS cluster and its disks. By default GemFire XD uses 10 threads for minor compaction and 2 threads for major compaction.
MajorCompact
Specify TRUE to enable automatic major compaction for the HDFS read/write log files. Major compaction removes deleted events from the HDFS log files, which can save space in HDFS and improve performance when reading from HDFS log files. GemFire XD performs major compaction by default. As major compaction process can be long-running and I/O-intensive, tune the performance of major compaction using MajorCompactionInterval and MajorCompactionThreads.
MajorCompactionInterval
The amount of time after which GemFire XD performs the next major compaction cycle. The default is 720 minutes. The minimum compaction interval is 1 minute. GemFire XD converts and stores the specified MajorCompactionInterval in minutes.
MajorCompactionThreads
The maximum number of threads that GemFire XD uses to perform major compaction in this HDFS store. Within a given bucket, only one compaction cycle (minor or major) can run at a given time. You can increase the number of threads used for compactions on different buckets as necessary in order to fully utilize the performance of your HDFS cluster and its disks. By default GemFire XD uses 10 threads for minor compaction and 2 threads for major compaction.
MaxWriteOnlyFileSize
For HDFS write-only tables, this defines the maximum size (in megabytes) that an HDFS log file can reach before GemFire XD closes the file and begins writing to a new file. This clause is ignored for HDFS read/write tables. Keep in mind that the operational logs files are not available for MapReduce processing until the file is closed; you can also set WriteOnlyFileRolloverInterval to specify the maximum amount of time an HDFS log file remains open. The default is 256 MB.
WriteOnlyFileRolloverInterval
For HDFS write-only tables, this defines the maximum time that can elapse before GemFire XD closes an HDFS log file and begins writing to a new file. This clause is ignored for HDFS read/write tables. The default is 3600 seconds. The minimum value is 1 second.
PurgeInterval
Defines the amount of time that GemFire XD allows expired HDFS log files to remain available for MapReduce jobs. After this interval has passed, GemFire XD deletes the expired files. The default is 720 minutes (12 hours). The minimum purge interval is 1 minute. GemFire XD converts and stores the specified PurgeInterval in minutes.

Example

Create a persistent connection to a Hadoop directory, storing HDFS log files in the hdfsstore1 subdirectory of the root directory defined by hdfs-root-dir. The HDFS event queue is also persisted using the default GemFire XD disk store:

CREATE HDFSSTORE hdfsstore1
  NAMENODE 'hdfs://gfxd1:8020'
  QueuePersistent true;
Reconfigure the HDFS queue to write batches as large as 10 megabytes, and to flush the queue to HDFS at least once every 2000 milliseconds:
ALTER HDFSSTORE hdfsstore1
  BatchSize 10
  BatchTimeInterval 2000 milliseconds;
This example reconfigures both minor and major compaction for the HDFS store. Minor compaction is performed on files up to 12 MB in size, and can involve as many as 8 files at a time. Any files larger than 12 MB are compacted during the major compaction cycle, which occurs every 10 minutes. A maximum of 3 threads are used in either compaction cycle:
ALTER HDFSSTORE hdfsstore1
  MinorCompact true
  MajorCompact true
  MaxInputFileSize 12
  MinInputFileCount 1
  MaxInputFileCount 8
  MinorCompactionThreads 3
  MajorCompactionInterval 10 minutes
  MajorCompactionThreads 3;