Skip to content

Improve default configuration (reliability & performance) #685

Closed
@lfrancke

Description

@lfrancke

Description

As a user of HDFS I want settings (see below for details) enabled to improve reliability in the face of crashes even if it comes at the expense of some minor performance issues.

Value & Details

dfs.datanode.sync.behind.writes & dfs.datanode.synconclose

Our main use-case for having HDFS in SDP is for HBase and we want it to be as reliable as possible. Currently there is a minor chance that data written to HDFS is not actually synced to disk even if a block has been closed. Users in HBase can control this explicitly for the WAL but for flushes and compactions I believe they can't as easily (if at all). In theory HBase should be able to recover from these failures but that comes at a cost and there's always a risk. Enabling this behaviour causes HDFS to sync to disk as soon as possible.

io.file.buffer.size

The default (4096) hasn't changed since 2009.

The size of buffer for use in sequence files. The size of this buffer should probably be a multiple of hardware page size (4096 on Intel x86), and it determines how much data is buffered during read and write operations. Must be greater than zero.

I suggest changing this from the current default of 4k to 128k to allow for faster transfers.

dfs.namenode.handler.count

Defaults to 10 since at least 2011.

This controls the concurrent number of client connections (this includes DataNodes) to the NameNode. Ideally we'd scale this with the number of DataNodes but this would lead to restarts of the NameNode so I'm in favor of increasing the default from 10 to 50 and not make it dynamic. This should lead to better performance due to more concurrency.

There's also a setting dfs.namenode.service.handler.count which is purely for connections from DataNodes but that is only used if dfs.namenode.servicerpc-address is also set which we don't do

dfs.datanode.handler.count

Defaults to 10 since at least 2012.

This controls the concurrent number of client connections to the DataNodes. We have no idea how many clients there may be so it's hard to automate this but the default can definitely be raised I'd say. I suggest increaing the default from 10 to 50 as well. This should lead to better performance due to more concurrency especially with use-cases like HBase.

dfs.namenode.replication.max-streams & dfs.namenode.replication.max-streams-hard-limit

Default to 2 and 4 respectively since around 2013. This controls the number of maximum replication "jobs" a NameNode assigns to a DataNode in a single heartbeat. Increasing this number will increase network usage during replication events but can lead to faster recovery.

We could make the defaults much higher (10/50 or so) but I'd prefer to do a minor increase instead not to overload any networks.
I suggest doubling these numbers: 4 & 8

There are others who set this much higher though: https://support.huawei.com/enterprise/en/knowledge/EKB1100015760

dfs.datanode.max.transfer.threads

Defaults to 4096 and hasn't changed since at least 2011.

The number of threads used for actual data transfer, so not very CPU heavy but IO bound. This is why the number is relatively high. But todays Java and IO should be able to handle more so I suggest bumping this to 8192 for more performance/concurrency.

Tasks

  • Set io.file.buffer.size to 131072 by default in core-site.xml
  • Set dfs.datanode.sync.behind.writes to true by default in hdfs-site.xml
  • Set dfs.datanode.synconclose to true by default in hdfs-site.xml
  • Set dfs.namenode.handler.count to 50 by default in hdfs-site.xml
  • Set dfs.datanode.handler.count to 50 by default in hdfs-site.xml
  • Set dfs.namenode.replication.max-streams to 4 by default in hdfs-site.xml
  • Set dfs.namenode.replication.max-streams-hard-limit to 8 by default in hdfs-site.xml
  • Set dfs.datanode.max.transfer.threads to 8192 by default in hdfs-site.xml

Release Notes

hdfs: Various setting defaults have been updated for better performance and reliability (#685)

References

Metadata

Metadata

Assignees

Labels

release-noteDenotes a PR that will be considered when it comes time to generate release notes.release/25.7.0

Type

No type

Projects

Status

Done

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions