Closed
Description
Affected Stackable version
24.3
Affected Apache Spark-on-Kubernetes version
3.5.0
Current and expected behavior
With the correct configuration in place for Kerberos and HDFS Spark jobs can be successfully started using a resource loaded from Kerberos-enabled HDFS by setting mainApplicationFile
to a HDFS URL e.g. mainApplicationFile: hdfs://poc-hdfs/user/stackable/pi.py
. The same Spark Job will fail if the property spark.submit.pyFiles
is configured pointing to a resource stored on the same HDFS cluster e.g. hdfs://poc-hdfs/user/stackable/mybanner.py
.
2024-06-25T10:21:14,754 WARN [main] org.apache.hadoop.fs.FileSystem - Failed to initialize fileystem hdfs://poc-hdfs/user/stackable/mybanner.py: java.lang.IllegalArgumentException: java.net.UnknownHostException: poc-hdfs
Possible solution
No response
Additional context
No response
Environment
No response
Would you like to work on fixing this bug?
None
### Tasks
- [ ] provide workaround here (if existent)
- [ ] optional: Report upstream Spark bug
Metadata
Metadata
Type
Projects
Status
Done