Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Question

I'm trying to run the spark examples from Eclipse and getting this generic error: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources.

The version I have is spark-1.6.2-bin-hadoop2.6. I started spark using the ./sbin/start-master.sh command from a shell, and set my sparkConf like this:

SparkConf conf = new SparkConf().setAppName("Simple Application");
conf.setMaster("spark://My-Mac-mini.local:7077");

I'm not bringing any other code here because this error pops up with any of the examples I'm running. The machine is a Mac OSX and I'm pretty sure it has enough resources to run the simplest examples.

What am I missing?

are u able to run the examples outside of eclipse ? using spark-submit ? — Knight71, Jun 30 '16 at 10:50
run-example will use local[*] instead of the spark master that you had run . Are you able to see the spark master UI and all the worker nodes in that ? — Knight71, Jun 30 '16 at 11:37
At http://localhost:8080/ I can see the running and completed applications. The workers line is empty. The spark master at the top of the page is spark://My-Mac-mini.local:7077 — Eddy, Jun 30 '16 at 11:46
OK, yes, I started the slave and now the Initial Job error is gone. But I am getting a java.lang.ClassNotFoundException: org.apache.spark.examples.JavaSparkPi$1 Any idea why? (I know this is another error so if you want to write the slave comment as an answer I'll gladly accept it. — Eddy, Jun 30 '16 at 12:37
I am not sure about eclipse environment. If you could do it spark-submit , I could help. Either --jars or --driver-class-path is not passed to spark-submit. You could do it via code as well but I prefer via spark-submit. — Knight71, Jun 30 '16 at 12:43
So even if I'm running it through the code (Eclipse) it still goes through spark-submit? — Eddy, Jun 30 '16 at 12:45
The way to solve the classdefnotfound exception is to have the application jar built and placed in a known location, and then add it to the JavaSparkContext like this: jsc.addJar("path to jar in filesystem"); — Eddy, Jun 30 '16 at 13:20

score 18 · Answer 1 · answered Jul 12 '19 at 13:12

18

I had the same problem, and it was because the workers could not communicate with the driver.

You need to set spark.driver.port (and open said port on your driver), spark.driver.host and spark.driver.bindAddress in your spark-submit from the driver.

answered Jul 12 '19 at 13:12

Maxime Maillot

397
2
8

2

Hi - Any idea what the values might be. I have tried multiple values but it doesn't take any values. – nEO Jan 02 '20 at 02:20
for the key-value parameters and the docs associated with them, see here: https://spark.apache.org/docs/2.4.2/configuration.html#configuring-logging – nate Dec 12 '22 at 20:31

score 16 · Accepted Answer · answered Jun 30 '16 at 12:41

16

The error indicates that you cluster has insufficient resources for current job.Since you have not started the slaves i.e worker . The cluster won't have any resources to allocate to your job. Starting the slaves will work.

`start-slave.sh <spark://master-ip:7077>`

answered Jun 30 '16 at 12:41

Knight71

2,927
5
37
63

3

What is meant by starting the slaves?..I have 1 master 1 slave running on EC2, when I try to submit python application via api http://stackoverflow.com/questions/38359801/spark-job-submitted-waiting-taskschedulerimpl-initial-job-not-accepted - getting above error – Chaitanya Bapat Jul 15 '16 at 09:59
3

The error is due to not having enough resources that your spark submit has asked for . Check your master ui for the workers and their resources. And compare it with the spark-submit params. starting slaves means starting the worker process. – Knight71 Jul 15 '16 at 10:04
POST - http://ec2-w-x-y-z-compute-1.amazonaws.com:6066/v1/submissions/create body-{ "action" : "CreateSubmissionRequest", "appArgs" : [ "/root/wordcount.py" ], "appResource" : "file:/root/wordcount.py", "clientSparkVersion" : "1.6.1", "environmentVariables" : { "SPARK_ENV_LOADED" : "1" },"mainClass" : "org.apache.spark.deploy.SparkSubmit", "sparkProperties" : { "spark.driver.supervise" : "false", "spark.app.name" : "MyApp", "spark.eventLog.enabled": "true", "spark.submit.deployMode" : "cluster", "spark.master" : "spark://ec2-w-x-y-z.compute-1.amazonaws.com:6066" }} – Chaitanya Bapat Jul 15 '16 at 10:35
above is my api call - to submit job to master - which delegates to 1 worker. how to check if worker process has started – Chaitanya Bapat Jul 15 '16 at 10:35
Hi @Knight71 - any idea why this error might persist even after starting the slave and enough resources still available. – nEO Jan 02 '20 at 02:21
@nEO what is your spark app requirement for resources ? Does it fit within the spark worker resources ? – Knight71 Jan 02 '20 at 10:15
Hi @Knight71 -here is the output of ```print (sc._conf.getAll())``` ```[('spark.master', 'spark://remote-master:7077'), ('spark.app.name', 'Pi'), ('spark.rdd.compress', 'True'), ('spark.app.id', 'app-20200102084602-0001'), ('spark.serializer.objectStreamReset', '100'), ('spark.driver.port', '63774'), ('spark.executor.id', 'driver'), ('spark.submit.deployMode', 'client'), ('spark.driver.host', '192.168.1.11'), ('spark.ui.showConsoleProgress', 'true')]```. – nEO Jan 02 '20 at 17:47
Please notice that the spar.driver.host is causing the issue IMO, since that is the IP of my local machine from which I am running the IDE as compared to the driver node on AWS EC2. I tried changing the config but whatever value I put, it doesn't work ```conf = pyspark.SparkConf().setAppName('Pi').setMaster('spark://master-url:7077') conf.set('spark.driver.host', '172.31.33.63') sc = pyspark.SparkContext(conf=conf)``` – nEO Jan 02 '20 at 17:51
After running the master and slave, I see there are 2 unused cores and 6.8GB memory available for processing, but it looks like the executor can't connect with the driver and I see this in of the error logs ```Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply from 192.168.1.11:62755 in 120 seconds ``` – nEO Jan 02 '20 at 17:52

score 6 · Answer 3 · answered May 21 '18 at 11:27

6

Solution to your Answer

Reason

Spark Master doesn't have any resources allocated to execute the Job like worker node or slave node.

Fix

You have to start the slave node by connecting with the master node like this /SPARK_HOME/sbin> ./start-slave.sh spark://localhost:7077 (if your master in your local node)

Conclusion

start your master node and also slave node during spark-submit, so that you will get the enough resources allocated to execute the job.

Alternate-way

You need to make necessary changes in spark-env.sh file which is not recommended.

answered May 21 '18 at 11:27

Praveen Kumar K S

3,024
1
24
31

Hi @Praveen Kumar K S, that means that when you start SparkSession with getOrCreate() command, it only requests for resources to the cluster for a slave worker? it always like that? – Manuel Carrero Mar 02 '20 at 05:09
1

Yes, first it looks for the resources from the slave node. – Praveen Kumar K S Mar 03 '20 at 07:44

score 5 · Answer 4 · answered Feb 11 '20 at 10:22

If you try to run your application with IDE, and you have free resources on your workers, you need to do this:

1) Before all, configure workers and master spark nodes.

2) Specify driver(PC) configuration to return calculation value from workers.

SparkConf conf = new SparkConf()
            .setAppName("Test spark")
            .setMaster("spark://ip of your master node:port of your master node")
            .set("spark.blockManager.port", "10025")
            .set("spark.driver.blockManager.port", "10026")
            .set("spark.driver.port", "10027") //make all communication ports static (not necessary if you disabled firewalls, or if your nodes located in local network, otherwise you must open this ports in firewall settings)
            .set("spark.cores.max", "12") 
            .set("spark.executor.memory", "2g")
            .set("spark.driver.host", "ip of your driver (PC)"); //(necessary)

score 3 · Answer 5 · answered Feb 08 '21 at 10:13

I had a stand-alone cluster setup on my local Mac machine with 1 master and 1 worker. The worker was connected to master and everything seemed to be Ok. However, to save memory I thought I will start the worker with 500M memory only and I had this problem. I restarted the worker with 1G of memory and it worked.

./start-slave.sh spark://{master_url}:{master_port} -c 2 -m 1G

score 1 · Answer 6 · answered May 24 '22 at 15:56

I've encountered the same issue while setting up a Spark cluster on EC2. The issue is that the worker instance is unable to reach the driver because the security group rules didn't open up the port. I confirmed this to be the issue by opening up all inbound ports for the driver temporarily - and then it worked.

In the Spark documentation, it says the spark.driver.port port is randomly assigned. So, to make this port a fixed one <driverPort>, I specified this port configuration when I created the SparkSession (I'm using PySpark).

That way, I need to open up only <driverPort> in the driver instance's security group - which is a better practice.

river · Answer 7 · 2016-06-30T10:44:04.300

-1

Try using "spark://127.0.0.1:7077" as a master address instead of *.local name. Sometime java is not able to resolve .local addresses - for reasons I don't understand.

edited Jun 30 '16 at 10:44

answered Jun 30 '16 at 09:17

river

774
5
16

This produces a "Failed to connect" exception. – Eddy Jun 30 '16 at 09:20
2

Please edit with more information. Code-only and "try this" answers are discouraged, because they contain no searchable content, and don't explain why someone should "try this". We make an effort here to be a resource for knowledge. – abarisone Jun 30 '16 at 09:55

Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

7 Answers7

Linked