0

I am testing a Spark Java application which uses the Impala JDBC driver for executing queries written in Impala syntax and write the results to an Excel template file. This application works perfectly in local mode into a kerberized Cloudera cluster.

The fact is that if this application is executed in yarn-cluster mode, it waited infinetly for the Kerberos password, so I decided to include some necessary files into the spark-submit call:

spark-submit --verbose --master yarn-cluster
 --files gss-jaas.conf,configuration.properties
 --principal myPrincipal --keytab myKeytab
  --name "app" --conf spark.executor.cores=2
 --conf spark.executor.memory=8G --conf spark.executor.instances=3
 --conf spark.driver.memory=256M 
--class myClass output.jar queriesConfig.json configuration.properties

And modifed the Java code to perform a system call to kinit prior to the kerberos login, in order to obtain an automatic access, as when the application was executed in local mode. However, I had no luck using the following code:

System.setProperty("java.security.auth.login.config", prop.getProperty("jdbc.kerberos.jaas"));
            System.setProperty("sun.security.jgss.debug", "true");
            System.setProperty("sun.security.krb5.debug", "true");
            System.setProperty("javax.security.auth.useSubjectCredsOnly", "true");
            System.setProperty("java.security.debug", "gssloginconfig,configfile,configparser,logincontext");
            System.setProperty("java.security.krb5.conf", prop.getProperty("kerberos.conf"));

            if (prop.getProperty("ssl.enabled") != null && "true".equals(prop.getProperty("ssl.enabled"))) {
                System.setProperty("javax.net.ssl.trustStore", prop.getProperty("trustStore.path"));
                System.setProperty("javax.net.ssl.trustStorePassword", prop.getProperty("trustStore.password"));
            }

            StringBuffer output = new StringBuffer();
            Process p;

            try {
                final String command = "kinit -k -t myKeytab myUser";
                p = Runtime.getRuntime().exec(command);
                p.waitFor();
                BufferedReader reader = new BufferedReader(new InputStreamReader(p.getInputStream()));

                String line = "";           
                while ((line = reader.readLine())!= null) {
                    output.append(line + "\n");
                }

                lc = new LoginContext(JDBC_DRIVER_JAAS, new TextCallbackHandler());
                if (lc != null) {

                    lc.login();
                }

            } catch (LoginException le) {
                LOGGER.error("LoginException . " + le.getMessage(), le);

            } catch (SecurityException se) {
                LOGGER.error("SecurityException . " + se.getMessage(), se);                 
            } catch (Exception e) {
                LOGGER.error("EXCEPTION !!! " + e.getMessage(), e);
            }

Obtaining the following error:

ERROR hive.JDBCHandler: Cannot create LoginContext. Integrity check on decrypted field failed (31) - PREAUTH_FAILED
javax.security.auth.login.LoginException: Integrity check on decrypted field failed (31) - PREAUTH_FAILED
    at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804)
    at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)

Which alternatives do I have, considering the queries should be in Impala syntax? Is it possible to login kerberos when the application is executed in yarn-cluster mode?

aloplop85
  • 892
  • 3
  • 16
  • 40
  • 1
    AFAIK `--principal --keytab` enable the Spark driver (running somewhere in a YARN container) to manage Kerberos credentials... but internally: your client session has no access to the Kerberos ticket, but only to Hadoop "auth tokens" (for HDFS, and optionally for Hive and/or HBase). Your Impala JDBC driver requires a raw ticket. You will have to push the keytab to the driver's container, and use a JAAS config file to create a private ticket on-the-fly (`kinit` would create a global ticket for _all your sessions on that server_ -- beware of side effects!!!) – Samson Scharfrichter Apr 26 '18 at 10:21
  • 1
    For a JAAS example, see my answer to https://stackoverflow.com/questions/42477466/error-when-connect-to-impala-with-jdbc-under-kerberos-authrication/42506620 > the tricky part would be to guess where exactly Spark would upload the JAAS conf file and the keytab (not necessarily in the container's CWD) -- you might have to generate the JAAS conf file dynamically at run-time, based on the actual location of the keytab – Samson Scharfrichter Apr 26 '18 at 10:30
  • 1
    PS: if you insist on `kinit`, then set env variable `KRB5_CCNAME` beforehand so that both `kinit` and the driver's JVM use a specific, local file to store the ticket -- hence it's not exposed to other jobs on the server. Cf. https://web.mit.edu/kerberos/krb5-1.15/doc/basic/ccache_def.html – Samson Scharfrichter Apr 26 '18 at 10:36
  • Thank you very much. We have tried your solutions, even code in this Github repo: https://github.com/jcmturner/java-kerberos-utils, but it was not possible to log Kerberos :( Attempting kerberos authentication of user: user@domain using username and password mechanism javax.security.auth.login.LoginException: Integrity check on decrypted field failed (31) - PREAUTH_FAILED – aloplop85 Apr 27 '18 at 13:37
  • 1
    Debugging Kerberos is a nightmare - cf. https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/sections/errors.html _(the whole GitBook is a must-read, written by a HortonWorks guru at the time he was maintaining the Hadoop auth lib)_ – Samson Scharfrichter Apr 27 '18 at 21:34
  • I have run into a problem with the "hive_metastore" dependency (https://github.com/onefoursix/Cloudera-Impala-JDBC-Example). Althouth I can log into Kerberos using yarn-client or yarn-cluster modes, I get a problem when initializing Hive:Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Iface.get_all_functions()Lorg/apache/hadoop/hive/metastore/api/GetAllFunctionsResponse; at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAllFunctions(HiveMetaStoreClient.java:2196) – aloplop85 Apr 30 '18 at 06:46

0 Answers0