I try to scrape a website that has a strange behavior. I point as URL the page I want to retrieve, as normal website present me login page, I submit the form elements and then I want to scrape the page but after I submit the form the website present me a page with a choice (two links) to choose my profile, after the click on a chosed profile I can access the page I want. In mechanize I can't click on a link to retrieve the page I want to read. This is my code:
from bs4 import BeautifulSoup as bs
import urllib3
import mechanize
import cookielib
cj = cookielib.CookieJar()
br = mechanize.Browser()
br.set_handle_robots(False)
br.set_cookiejar(cj)
br.open("the_url_I_want_scrape")
br.select_form(nr=2)
br.form.set_all_readonly(False)
br.form['username'] = "my_user"
br.form["password"] = "my_pass"
br.form["button.submit"] = "entra"
br.submit()
html = br.response().read()
Now if i iterate in a links I have two objects:
for link in br.links():
print link
That it's look like follow lines:
Link(base_url='https://www.sito.com/internal/login', url='/internal/sessionProperty?sessid=1111', text='Profile1', tag='a', attrs=[('href', '/nternal/sessionProperty?sessid=1111')])
Link(base_url='https://www.sito.com/internal/login', url='/shres/internal/sessionProperty?sessid=3333', text='Profile2', tag='a', attrs=[('href', '/internal/sessionProperty?sessid=3333')])
How can I simulate a click on it and the parse the result page? I've tried to add abolute_url to the link and then use follow_link but it hangs and not respond anymore The code I use is:
for link in br.links():
link.absolute_url = mechanize.urljoin(link.base_url,link.url)
br.follow_link(link)
Someone can help me? Thank you Alex