0

I'me trying to connect RStudio to Hive that has Kerberos authentication. If I run the below in an R script called from the command line, it works.

library("DBI")
library("rJava")
library("RJDBC")

cp = c("/u01/cloudera/parcels/CDH/lib/hive/lib/hive-jdbc.jar"
, "/u01/cloudera/parcels/CDH/lib/hadoop/hadoop-common.jar"
, "/u01/cloudera/parcels/CDH/lib/hive/lib/libthrift-0.9.2.jar"
, "/u01/cloudera/parcels/CDH/lib/hive/lib/hive-service.jar"
, "/u01/cloudera/parcels/CDH/lib/hive/lib/httpclient-4.2.5.jar"
, "/u01/cloudera/parcels/CDH/lib/hive/lib/httpcore-4.2.5.jar"
, "/u01/cloudera/parcels/CDH/lib/hive/lib/hive-jdbc-standalone.jar")
.jinit(classpath=cp)

drv <- JDBC("org.apache.hive.jdbc.HiveDriver" , "hive-jdbc.jar" )

conn <- dbConnect(drv , "jdbc:hive2://XXXX:10000/default;principal=hive/XXXX@XXXXX";auth-kerberos)

If I run the exact same script in RStudio, I get an error:

javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

If I run system('klist') in RStudio, it shows I have a valid ticket. It seems RStudio isn't able to identify the ticket but R is. Any ideas?

  • Which OS are you running on? What tool did you use to generate the ticket? Did you tinker with the env variable `KRB5CCNAME`? – Samson Scharfrichter May 04 '17 at 14:44
  • OS is Red Hat 6.5. Used kinit to generate the ticket. The variable KRB5CCNAME isn't set/exists when I run Sys.getenv() – Scott Bradshaw May 05 '17 at 14:16
  • Try to force some Java system properties, that `.jinit` cannot handle, with an env variable e.g. `export JAVA_TOOL_OPTIONS="-Djavax.security.auth.useSubjectCredsOnly=false -Dsun.security.krb5.debug=true"` >> for the props that might make a difference, cf. my answer to https://stackoverflow.com/questions/42477466/error-when-connect-to-impala-with-jdbc-under-kerberos-authrication/42506620 – Samson Scharfrichter May 05 '17 at 22:27
  • Note that if you must get down to the JAAS config file, in your case, the subject name should be `com.sun.security.jgss.krb5.initiate` (cf. Hive driver) and it should contain `useTicketCache=true useKeyTab=false` and no "keyTab" entry – Samson Scharfrichter May 05 '17 at 22:34
  • Side note: you should use the placeholder `_HOST` in the URL, i.e. `...;principal=hive/_HOST@XXXXX` because it's easier, and also more generic *(i.e. if you ever migrate to a High Availability setup, you can't know in advance for which host you will be requesting a Kerberos service ticket)* – Samson Scharfrichter May 05 '17 at 22:53
  • Adding the environment variable worked! Can now access Hive through RStudio :) Thanks so much – Scott Bradshaw May 08 '17 at 08:15
  • OK, I just posted a formal answer, with more context. Could be helpful for other people. – Samson Scharfrichter May 08 '17 at 19:34

1 Answers1

3

Some boring stuff first, to put things into context, then the solution.

  • Kerberos: it's complicated by nature (think cryptography network), even without considering that Microsoft has its own implementation and extensions
  • Java and Kerberos: it's even more complicated (only partial support, subtle changes in Java versions, etc.)
  • Hadoop and Java and Kerberos: it's complicated and ugly (read the GitBook "Hadoop and Kerberos, the Madness beyond the Gate" if you really want to lose your sanity) and it's even worse on Windows cf. lack of an official build for the required Hadoop "native libs"
  • Hive and JDBC and Kerberos: the good news is that you don't need the Hadoop "ugly" part unless you are using the Apache JDBC driver on Windows (hint: ditch it and opt for the Cloudera JDBC driver!); the bad news is that you may need raw JAAS configuration and specific Java system properties
  • R and Java/JDBC: it works quite well, except that sometimes you want to pass specific Java system properties to the JVM -- either at launch time or at run time -- but .jinit does not support that AFAIK, you must resort to a workaround


There is one Java system property that must be set for Kerberos auth to work in JDBC, and it's not always set by default.
But you can't set that Java property from R directly; you have to set an environment variable (either before starting R, or from R code but before .jinit)

Option 1: from a Linux shell script, before starting R...

export JAVA_TOOL_OPTIONS="-Djavax.security.auth.useSubjectCredsOnly‌​=false"

Option 2: from your R code...

Sys.setenv(JAVA_TOOL_OPTIONS="-Djavax.security.auth.useSubjectCredsOnly‌​=false")
.jinit(...)


Now, that may not be sufficient in all cases. Maybe you need to use a specific Kerberos config because your Hadoop cluster uses its own KDC. Maybe you don't want to use the default Kerberos ticket, but instead authenticate as a service account, using a password stored in a keytab file.
And maybe you need some debugging information because, well, shit happens (and security libraries are quite secretive by default, not to make things too easy for hackers, I suppose...)

Please refer to that post for more information about advanced Java configuration for Hive/Impala JDBC with Kerberos.

And be careful when setting the environment variable: simulate a Java command-line i.e. -Dsome.key=value -Dsome.other.key=blahblah; in shell script, use quotes (because of the separating space); in R code, use a single string, no array.

Samson Scharfrichter
  • 8,884
  • 1
  • 17
  • 36