37

I'm trying to run the spark examples from Eclipse and getting this generic error: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources.

The version I have is spark-1.6.2-bin-hadoop2.6. I started spark using the ./sbin/start-master.sh command from a shell, and set my sparkConf like this:

SparkConf conf = new SparkConf().setAppName("Simple Application");
conf.setMaster("spark://My-Mac-mini.local:7077");

I'm not bringing any other code here because this error pops up with any of the examples I'm running. The machine is a Mac OSX and I'm pretty sure it has enough resources to run the simplest examples.

What am I missing?

Eddy
  • 3,533
  • 13
  • 59
  • 89
  • are u able to run the examples outside of eclipse ? using spark-submit ? – Knight71 Jun 30 '16 at 10:50
  • I'm able to do ./bin/run-example SparkPi 10 successfully. – Eddy Jun 30 '16 at 11:35
  • run-example will use local[*] instead of the spark master that you had run . Are you able to see the spark master UI and all the worker nodes in that ? – Knight71 Jun 30 '16 at 11:37
  • At http://localhost:8080/ I can see the running and completed applications. The workers line is empty. The spark master at the top of the page is spark://My-Mac-mini.local:7077 – Eddy Jun 30 '16 at 11:46
  • You should start your worker also by doing start-slave.sh – Knight71 Jun 30 '16 at 12:28
  • OK, yes, I started the slave and now the Initial Job error is gone. But I am getting a java.lang.ClassNotFoundException: org.apache.spark.examples.JavaSparkPi$1 Any idea why? (I know this is another error so if you want to write the slave comment as an answer I'll gladly accept it. – Eddy Jun 30 '16 at 12:37
  • I am not sure about eclipse environment. If you could do it spark-submit , I could help. Either --jars or --driver-class-path is not passed to spark-submit. You could do it via code as well but I prefer via spark-submit. – Knight71 Jun 30 '16 at 12:43
  • So even if I'm running it through the code (Eclipse) it still goes through spark-submit? – Eddy Jun 30 '16 at 12:45
  • The way to solve the classdefnotfound exception is to have the application jar built and placed in a known location, and then add it to the JavaSparkContext like this: jsc.addJar("path to jar in filesystem"); – Eddy Jun 30 '16 at 13:20

7 Answers7

18

I had the same problem, and it was because the workers could not communicate with the driver.

You need to set spark.driver.port (and open said port on your driver), spark.driver.host and spark.driver.bindAddress in your spark-submit from the driver.

Maxime Maillot
  • 397
  • 2
  • 8
  • 2
    Hi - Any idea what the values might be. I have tried multiple values but it doesn't take any values. – nEO Jan 02 '20 at 02:20
  • for the key-value parameters and the docs associated with them, see here: https://spark.apache.org/docs/2.4.2/configuration.html#configuring-logging – nate Dec 12 '22 at 20:31
16

The error indicates that you cluster has insufficient resources for current job.Since you have not started the slaves i.e worker . The cluster won't have any resources to allocate to your job. Starting the slaves will work.

`start-slave.sh <spark://master-ip:7077>`
Knight71
  • 2,927
  • 5
  • 37
  • 63
  • 3
    What is meant by starting the slaves?..I have 1 master 1 slave running on EC2, when I try to submit python application via api http://stackoverflow.com/questions/38359801/spark-job-submitted-waiting-taskschedulerimpl-initial-job-not-accepted - getting above error – Chaitanya Bapat Jul 15 '16 at 09:59
  • 3
    The error is due to not having enough resources that your spark submit has asked for . Check your master ui for the workers and their resources. And compare it with the spark-submit params. starting slaves means starting the worker process. – Knight71 Jul 15 '16 at 10:04
  • POST - http://ec2-w-x-y-z-compute-1.amazonaws.com:6066/v1/submissions/create body-{ "action" : "CreateSubmissionRequest", "appArgs" : [ "/root/wordcount.py" ], "appResource" : "file:/root/wordcount.py", "clientSparkVersion" : "1.6.1", "environmentVariables" : { "SPARK_ENV_LOADED" : "1" },"mainClass" : "org.apache.spark.deploy.SparkSubmit", "sparkProperties" : { "spark.driver.supervise" : "false", "spark.app.name" : "MyApp", "spark.eventLog.enabled": "true", "spark.submit.deployMode" : "cluster", "spark.master" : "spark://ec2-w-x-y-z.compute-1.amazonaws.com:6066" }} – Chaitanya Bapat Jul 15 '16 at 10:35
  • above is my api call - to submit job to master - which delegates to 1 worker. how to check if worker process has started – Chaitanya Bapat Jul 15 '16 at 10:35
  • Hi @Knight71 - any idea why this error might persist even after starting the slave and enough resources still available. – nEO Jan 02 '20 at 02:21
  • @nEO what is your spark app requirement for resources ? Does it fit within the spark worker resources ? – Knight71 Jan 02 '20 at 10:15
  • Hi @Knight71 -here is the output of ```print (sc._conf.getAll())``` ```[('spark.master', 'spark://remote-master:7077'), ('spark.app.name', 'Pi'), ('spark.rdd.compress', 'True'), ('spark.app.id', 'app-20200102084602-0001'), ('spark.serializer.objectStreamReset', '100'), ('spark.driver.port', '63774'), ('spark.executor.id', 'driver'), ('spark.submit.deployMode', 'client'), ('spark.driver.host', '192.168.1.11'), ('spark.ui.showConsoleProgress', 'true')]```. – nEO Jan 02 '20 at 17:47
  • Please notice that the spar.driver.host is causing the issue IMO, since that is the IP of my local machine from which I am running the IDE as compared to the driver node on AWS EC2. I tried changing the config but whatever value I put, it doesn't work ```conf = pyspark.SparkConf().setAppName('Pi').setMaster('spark://master-url:7077') conf.set('spark.driver.host', '172.31.33.63') sc = pyspark.SparkContext(conf=conf)``` – nEO Jan 02 '20 at 17:51
  • After running the master and slave, I see there are 2 unused cores and 6.8GB memory available for processing, but it looks like the executor can't connect with the driver and I see this in of the error logs ```Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply from 192.168.1.11:62755 in 120 seconds ``` – nEO Jan 02 '20 at 17:52
6

Solution to your Answer

Reason

  1. Spark Master doesn't have any resources allocated to execute the Job like worker node or slave node.

Fix

  1. You have to start the slave node by connecting with the master node like this /SPARK_HOME/sbin> ./start-slave.sh spark://localhost:7077 (if your master in your local node)

Conclusion

  1. start your master node and also slave node during spark-submit, so that you will get the enough resources allocated to execute the job.

Alternate-way

  1. You need to make necessary changes in spark-env.sh file which is not recommended.
Praveen Kumar K S
  • 3,024
  • 1
  • 24
  • 31
5

If you try to run your application with IDE, and you have free resources on your workers, you need to do this:

1) Before all, configure workers and master spark nodes.

2) Specify driver(PC) configuration to return calculation value from workers.

SparkConf conf = new SparkConf()
            .setAppName("Test spark")
            .setMaster("spark://ip of your master node:port of your master node")
            .set("spark.blockManager.port", "10025")
            .set("spark.driver.blockManager.port", "10026")
            .set("spark.driver.port", "10027") //make all communication ports static (not necessary if you disabled firewalls, or if your nodes located in local network, otherwise you must open this ports in firewall settings)
            .set("spark.cores.max", "12") 
            .set("spark.executor.memory", "2g")
            .set("spark.driver.host", "ip of your driver (PC)"); //(necessary)
dancelikefish
  • 51
  • 1
  • 1
3

I had a stand-alone cluster setup on my local Mac machine with 1 master and 1 worker. The worker was connected to master and everything seemed to be Ok. However, to save memory I thought I will start the worker with 500M memory only and I had this problem. I restarted the worker with 1G of memory and it worked.

./start-slave.sh spark://{master_url}:{master_port} -c 2 -m 1G
Adelin
  • 18,144
  • 26
  • 115
  • 175
1

I've encountered the same issue while setting up a Spark cluster on EC2. The issue is that the worker instance is unable to reach the driver because the security group rules didn't open up the port. I confirmed this to be the issue by opening up all inbound ports for the driver temporarily - and then it worked.

In the Spark documentation, it says the spark.driver.port port is randomly assigned. So, to make this port a fixed one <driverPort>, I specified this port configuration when I created the SparkSession (I'm using PySpark).

That way, I need to open up only <driverPort> in the driver instance's security group - which is a better practice.

Sidharth Ramesh
  • 646
  • 2
  • 6
  • 21
-1

Try using "spark://127.0.0.1:7077" as a master address instead of *.local name. Sometime java is not able to resolve .local addresses - for reasons I don't understand.

river
  • 774
  • 5
  • 16
  • This produces a "Failed to connect" exception. – Eddy Jun 30 '16 at 09:20
  • 2
    Please edit with more information. Code-only and "try this" answers are discouraged, because they contain no searchable content, and don't explain why someone should "try this". We make an effort here to be a resource for knowledge. – abarisone Jun 30 '16 at 09:55