1

Here is the situation: we have a secured (Kerberos) HBase cluster. I have an object that creates an instance of HTable at startup and hang on to it. It calls:

UserGroupInformation.setConfiguration(configuration);
UserGroupInformation.loginUserFromKeytab(user, keytab);

to login to the Kerberized cluster. This object then hangs around unused for many hours. After more than 10 hours (the timeout on a ticket from our Kerberos cluster), the next call to scan the table results in this:

16/12/01 18:16:24 WARN security.UserGroupInformation: PriviledgedActionException as:bigdata-app-analyticscore-msr@INTQA.THOMSONREUTERS.COM (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
16/12/01 18:16:24 WARN ipc.RpcClient: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
16/12/01 18:16:24 FATAL ipc.RpcClient: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'.
- javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
- at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
- etc.

How can I keep the Kerberos authentication alive?

Janik Zikovsky
  • 3,086
  • 7
  • 26
  • 34

1 Answers1

1

I happened to be doing some research in this forum earlier. The problem statement here, where Kerberos authentication dies after 10 hours, is nearly identical to that of this thread:

Renewing a connection to Apache Phoenix (using Kerberos) fails after exactly 10 hours

I actually just edited that thread earlier today and placed the "10 hours" into the Subject line. That thread contains some great advice on what to do here. I'm going to go ahead and borrow the good wisdom provided by Samson Scharfrichter who stated in it: "The standard solution is to spawn a background thread invoking checkTGTAndReloginFromKeytab() periodically -- see Should I call ugi.checkTGTAndReloginFromKeytab() before every action on hadoop? for a very elaborate explanation by a HortonWorks guru (a colleague of the guy who wrote that GitBook about Hadoop & Kerberos)"

I hope this provides your direction.

Community
  • 1
  • 1
T-Heron
  • 5,385
  • 7
  • 26
  • 52
  • I tried setting up a timer calling ugi.checkTGTAndReloginFromKeytab() every 5 minutes, but I got the same issue. Oddly enough, I have other processes that keep an HTable instance open long-term, but they access the table more frequently and did not need to re-login. Next thing I will try is to actually read from the table on a timer. – Janik Zikovsky Dec 11 '16 at 14:36
  • I think my issue was that the same process was trying to access an unsecured and a secured cluster in the same session. We added code to ensure the right login is done before any action. Without this situation, I believe checkTGTAndReloginFromKeytab() before every action is the solution. – Janik Zikovsky Mar 29 '17 at 21:02