2

For Spark jobs running on YARN (yarn-client), is it possible to specify the classpath with jars located in HDFS

A bit like it was possible with Map Reduce jobs with :

DistributedCache.addFileToClassPath(Path file, Configuration conf, FileSystem fs)
Tony
  • 1,214
  • 14
  • 18

1 Answers1

2

From the SparkContext documentation:

def addJar(path: String): Unit

Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. The path passed can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node.

So I think it is enough to just add this in your sparkContext initialization:

sc.addJar("hdfs://your/path/to/whatever.jar")

If you want to add just a file, there is a relevant addFile() method.

See docs for more.

Community
  • 1
  • 1
TheMP
  • 8,257
  • 9
  • 44
  • 73