Running spark application from HDInsight cluster headnode

Question

I am trying to run spark scala application from head node of azure HDInsight cluster with command

spark-submit --class com.test.spark.Wordcount SparkJob1.jar wasbs://containername@<storageaccountname>/sample.sas7bdat wasbs://containername@<storageaccountname>/sample.csv

I am getting below exception with it.

Caused by: java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD

Same jar file is working if I invoke from Azure data factory . Am I missing some configuration with spark-submit command?

score 0 · Answer 1 · edited May 23 '17 at 12:17

0

Normally, it was caused by your code logic about type conversion. There is a similar SO thread How to fix java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List to field type scala.collection.Seq? which had been answered, I think you can refer to it and check your code for resolving the issue.

edited May 23 '17 at 12:17

Community

1
1

answered Mar 28 '17 at 08:13

Peter Pan

23,476
4
25
43

Running spark application from HDInsight cluster headnode

1 Answers1