0

I am trying to execute a Spark job on a Kerberos enabled YARN cluster (Hortonworks). This jobs reads and writes data from/to HBase. Unfortunately I have some problem with the authentication (esp. when the Spark job tries to access the HBase data) - and understanding how the authentication works. Here is the error I am getting:

ERROR yarn.ApplicationMaster: User class threw exception: java.io.IOException: Login failure for username from keytab keytabFile java.io.IOException: Login failure for username from keytab keytabFile    at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:1146)

I want the authentication to happen based on a keytab of a (technical) user. Therefore I have currently 2 places, where I provide the principal and keytab information:

  1. In the spark-submit script with the --principal and --keytab options
  2. In the code with UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)

My questions:

  • What is the purpose of each of the two above mentioned places to provide the keytab? Is either of one only used to authenticate against the YARN cluster to get the resources? Or do I really need to provide the principal/keytab information twice for the two different authentications (against YARN and against HBase)? How does Spark handle all that internally?
  • Do I need to provide the principal as username or as username@principal? Is it the same for both places?
  • I need to have the keytab file distributed to all worker nodes in the same location, right? To which user must the keytab files be readable? Or is there also a way to pass it around through the spark-submit script?

I know, a lot of questions... I appreciate your help or any hints!

Thanks and regards

Daniel
  • 2,409
  • 2
  • 26
  • 42

0 Answers0