Spark job running for long for too small data

Question

I have a spark code as below running on master:

import pyspark
from pyspark import SparkContext
sc =SparkContext()
nums= sc.parallelize([1,2,3,4])
nums.collect()

My cluster config: 3 nodes(1 master+2 slaves) in Standalone/client mode

Master config 600mb RAM, 1CPU
Slave1 config 600mb RAM, 1CPU
Slave2 config 16GB RAM, 4CPU

When I submit my job using the command I have a long-running job

spark-submit --master spark://<MASTER_IP>:7077 --num-executors=6 --conf spark.driver.memory=500M --conf spark.executor.memory=6G --deploy-mode client test.py

Logs on screen:

20/05/11 19:43:09 INFO BlockManagerMaster: Removal of executor 105 requested
20/05/11 19:43:09 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20200511193954-0001/106 on worker-20200511192038--MASTER_IP:44249 (MASTER_IP:44249) with 4 core(s)
20/05/11 19:43:09 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asked to remove non-existent executor 105
20/05/11 19:43:09 INFO BlockManagerMasterEndpoint: Trying to remove executor 105 from BlockManagerMaster.
20/05/11 19:43:10 INFO StandaloneSchedulerBackend: Granted executor ID app-20200511193954-0001/106 on hostPort MASTER_IP:44249 with 4 core(s), 6.0 GB RAM
^C20/05/11 19:43:58 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Solutions tried:

I tried adding a new cluster Slave3 because of the above-searched error about insufficient resources but the error still persists with scaling.

Is it because of less memory in Master node?? Any suggestions here??

I don't know why it is being downgraded? I am new to this so seeking few inputs — user7422128, May 11 '20 at 20:12
Possible duplicate of https://stackoverflow.com/questions/38118572/initial-job-has-not-accepted-any-resources-check-your-cluster-ui-to-ensure-that — Rayan Ral, May 12 '20 at 20:19

score -1 · Answer 1 · answered May 11 '20 at 20:43

-1

Just try running with minimal requirements first. Also change the deploy-mode to cluster to use the worker nodes. Read more at https://spark.apache.org/docs/latest/submitting-applications.html

spark-submit --master spark://<MASTER_IP>:7077 --num-executors=2 --conf spark.driver.memory=100M  --conf spark.executor.memory=200M --deploy-mode cluster test.py

answered May 11 '20 at 20:43

Mahendra Singh Meena

578
2
8

it is working in client mode and it is not working with the min configs as well – user7422128 May 12 '20 at 03:16

Spark job running for long for too small data

1 Answers1