Description
Description
As a user of HDFS I want settings (see below for details) enabled to improve reliability in the face of crashes even if it comes at the expense of some minor performance issues.
Value & Details
dfs.datanode.sync.behind.writes & dfs.datanode.synconclose
Our main use-case for having HDFS in SDP is for HBase and we want it to be as reliable as possible. Currently there is a minor chance that data written to HDFS is not actually synced to disk even if a block has been closed. Users in HBase can control this explicitly for the WAL but for flushes and compactions I believe they can't as easily (if at all). In theory HBase should be able to recover from these failures but that comes at a cost and there's always a risk. Enabling this behaviour causes HDFS to sync to disk as soon as possible.
io.file.buffer.size
The default (4096) hasn't changed since 2009.
The size of buffer for use in sequence files. The size of this buffer should probably be a multiple of hardware page size (4096 on Intel x86), and it determines how much data is buffered during read and write operations. Must be greater than zero.
I suggest changing this from the current default of 4k to 128k to allow for faster transfers.
dfs.namenode.handler.count
Defaults to 10 since at least 2011.
This controls the concurrent number of client connections (this includes DataNodes) to the NameNode. Ideally we'd scale this with the number of DataNodes but this would lead to restarts of the NameNode so I'm in favor of increasing the default from 10 to 50 and not make it dynamic. This should lead to better performance due to more concurrency.
There's also a setting dfs.namenode.service.handler.count
which is purely for connections from DataNodes but that is only used if dfs.namenode.servicerpc-address
is also set which we don't do
dfs.datanode.handler.count
Defaults to 10 since at least 2012.
This controls the concurrent number of client connections to the DataNodes. We have no idea how many clients there may be so it's hard to automate this but the default can definitely be raised I'd say. I suggest increaing the default from 10 to 50 as well. This should lead to better performance due to more concurrency especially with use-cases like HBase.
dfs.namenode.replication.max-streams & dfs.namenode.replication.max-streams-hard-limit
Default to 2 and 4 respectively since around 2013. This controls the number of maximum replication "jobs" a NameNode assigns to a DataNode in a single heartbeat. Increasing this number will increase network usage during replication events but can lead to faster recovery.
We could make the defaults much higher (10/50 or so) but I'd prefer to do a minor increase instead not to overload any networks.
I suggest doubling these numbers: 4 & 8
There are others who set this much higher though: https://support.huawei.com/enterprise/en/knowledge/EKB1100015760
dfs.datanode.max.transfer.threads
Defaults to 4096 and hasn't changed since at least 2011.
The number of threads used for actual data transfer, so not very CPU heavy but IO bound. This is why the number is relatively high. But todays Java and IO should be able to handle more so I suggest bumping this to 8192 for more performance/concurrency.
Tasks
- Set
io.file.buffer.size
to 131072 by default in core-site.xml - Set
dfs.datanode.sync.behind.writes
totrue
by default in hdfs-site.xml - Set
dfs.datanode.synconclose
totrue
by default in hdfs-site.xml - Set
dfs.namenode.handler.count
to 50 by default in hdfs-site.xml - Set
dfs.datanode.handler.count
to 50 by default in hdfs-site.xml - Set
dfs.namenode.replication.max-streams
to 4 by default in hdfs-site.xml - Set
dfs.namenode.replication.max-streams-hard-limit
to 8 by default in hdfs-site.xml - Set
dfs.datanode.max.transfer.threads
to 8192 by default in hdfs-site.xml
Release Notes
hdfs: Various setting defaults have been updated for better performance and reliability (#685)
References
Metadata
Metadata
Assignees
Type
Projects
Status
Status