1

I manage around 150 email accounts of my company, and I wrote a Python script for Selenium WebDriver that automates actions (deleting spams, emptying the trash,...) one account after the other, several times a day, and it is way too slow, and crashes all the time. I read that Selenium Grid with Docker on Amazon AWS could do the job, and it seems that the "parallel" option for Selenium WebDriver could too.

What I need to do simultaneously :

1) Login (all the accounts)

2) Perform actions (delete spams, empty the trash,...)

3) Close the Chrome instances

I currently have to use a for loop to create 150 times the same instructions that I store in lists, and this is not optimized at all, it makes my computer crash... In a nutshell, I know it's not the way to go and I'm looking forward to have this running simultaneously in parallel.

Here is a shortened version of the code I am using:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException


## GET THE USERS' EMAILS
emails = []
pwds = []
with open("users.txt", "r") as users_file: # Open the text file containing emails and pwds
  for line in users_file:
      email_and_pwd = line.split() # Extract the line of the .txt file as a list
      emails.append(email_and_pwd[0]) # Add the email in the emails list
      pwds.append(email_and_pwd[1]) # Add the pwd in the pwds list
nbr_users = len(emails) # number of users


## CREATE LISTS OF INSTRUCTIONS THAT WILL BE EXECUTED LATER AS CODE
create_new_driver = []
go_to_email_box = []
fill_username_box = []
fill_password_box = []
# Here I have the same type of lines to create lists that will contain the instructions to click on the Login button, then delete spams, then empty the trash, etc...


## FILL THE LISTS WITH INSTRUCTIONS AS STRINGS
for i in range(nbr_users):
  create_new_driver_element = 'driver' + str(i) + ' = webdriver.Chrome(options = chrome_options)'
  create_new_driver.append(create_new_driver_element)

  # Here I have the same type of lines to create the rest of the instructions to fill the lists


## EXECUTE THE INSTRUCTIONS SAVED IN THE LISTS
for user in range(nbr_users):
  exec(create_new_driver[user])
  # Here I have the same type of lines to execute the code contained in the previously created lists

After browsing the internet for days to no results, any kind of help is appreciated. Thank you very much !

John
  • 11
  • 2
  • 1
    Are you managing bot accounts? Or actual employee accounts? It's probably not a good idea to store the credentials in a text file if these are important accounts with sensitive emails. – Joe Oct 15 '19 at 03:50
  • Welcome to SO. Please take the time to read [ask]. It will help you craft solid questions that will hopefully get useful answers. IMO: selenium is the wrong tool for email administration... – orde Oct 15 '19 at 04:16
  • Hi Joe I am managing bot accounts so in terms of security it is not a problem to have the credentials stored in a text file, as the purpose of these accounts is to receive promotional offers and newsletters and separate the useful content from the spam. – John Oct 16 '19 at 23:29

2 Answers2

1

I tried to make a comment on @Jetro Cao's answer but it was too much to fit in a comment. I'll base my response off his work.

Why is your program slow?

As @Jetro Cao mentioned, you are running everything sequentially rather than in parallel.

Will another tool make this easier?

Yes. If possible, you should use an email administration tool (ex: G Suite for gmail) rather than logging into these accounts individually through python scripts.

Is it safe to store my 150 account credentials in a text file?

No. Unless these are meaningless bot accounts without any sensitive emails.

Code Advice:

Overall your code is off to a great start. A few specific comments:

  1. Avoid using exec
  2. Use threading as recommended by @Jetro Cao
  3. You haven't provided us with a large portion of your code. When using selenium, some tricks can be used to help speed things up. One example is loading the login page directly rather than loading a home page and then "clicking" login.
import threading

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException


## GET THE USERS' EMAILS
emails = []
pwds = []
with open("users.txt", "r") as users_file: # Open the text file containing emails and pwds
for line in users_file:
    email, pwd= line.split() # Extract the line of the .txt file as a list
    emails.append(email) # Add the email in the emails list
    pwds.append(pwd) # Add the pwd in the pwds list
nbr_users = len(emails) # number of users


## CREATE LISTS OF INSTRUCTIONS THAT WILL BE EXECUTED LATER AS CODE
drivers = []
go_to_email_box = []
fill_username_box = []
fill_password_box = []
# Here I have the same type of lines to create lists that will contain the instructions to click on the Login button, then delete spams, then empty the trash, etc...


## FILL THE LISTS WITH INSTRUCTIONS AS STRINGS

threads = []
for i in range(nbr_users):
    t = threading.Thread(target=webdriver.Chrome, kwargs={'options': chrome_options})
    t.start()
    threads.append(t)

for t in threads:
    t.join()
Joe
  • 497
  • 1
  • 3
  • 11
  • 1
    Appreciate the expansion and extra tips. Just one nitpick: you typo'd my name lol – Jethro Cao Oct 15 '19 at 04:35
  • @JethroCao Good catch! Sorry about that! – Joe Oct 15 '19 at 04:38
  • Thank you very much for your answers! I am trying this as soon as possible ! As a side note the reason why I'm not using G suite or a similar tool is because of the lack of flexibility and customization. I am trying to find an alternative to exec() as most of my code was relying on this function (I read the article that you added to your reply and yes, it makes sense). – John Oct 17 '19 at 00:06
0

My hunch is your program is slow due to you executing the set of instructions for one user after another. And although I don't know the specifics of how Selenium works, I'd imagine there's probably quite a bit of networking IO going on there, for which you're waiting until completion with each user before executing the same set for the next user.

As you correctly surmised, what you need is to be able to parallelize them more. Python's threading module can help you do that, in particular what you need is the Thread class.

Using just your create_new_driver as an example, you can try something like the following:

# ...
# your other imports
# ...
import threading

# ...
# setting up your user info and instructions lists
# ...

# create function that will execute the instructions for a single user,
# which will be used as the argument for target param of the Thread constructor 
def manage_single_user(i):
    user = nbr_user[i]
    exec(create_new_driver[user])
    # execute rest of the instructions for same user

threads = []
for i in range(nbr_users):
    t = threading.Thread(target=manage_single_user, args=(i,))
    t.start()
    threads.append(t)

for t in threads:
    t.join()

The above will start running the set of instructions for each user from the get go, so you'll be waiting on network IOs from each simultaneously. Thus as long as executing a set of instructions for a single user is sufficiently fast, you should see significant performance in speed.

Jethro Cao
  • 968
  • 1
  • 9
  • 19