1

I'm trying to use python to scrape a webpage with a login page. I've tried using a few examples from stackoverflow that have been accepted as working examples, but none seem to work for me

Attempt 1:

import requests
from lxml import html

USERNAME = "my username"
PASSWORD = "my password"
TOKEN = "my token"

LOGIN_URL = "https://example.com/admin/login"
URL = "https://example.com/admin/tickets"

session_requests = requests.session()

# Get login csrf token
result = session_requests.get(LOGIN_URL)
tree = html.fromstring(result.text)authenticity_token = list(set(tree.xpath("//input[@name='_token']/@value")))[0]

# Create payload
payload = {
"name": USERNAME, 
"password": PASSWORD, 
"_token": TOKEN
}

# Perform login
result = session_requests.post(LOGIN_URL, data = payload, headers = dict(referer = LOGIN_URL))

# Scrape url
result = session_requests.get(URL, headers = dict(referer = URL))
tree = html.fromstring(result.content)
bucket_names = tree.xpath("//div[@class='a']/a/text()")

print(bucket_names)

Attempt 2:

import requests
from bs4 import BeautifulSoup

username = 'my username'
password = 'my password'
scrape_url = 'https://example.com/admin/tickets'

login_url = 'https://example.com/admin/login'
login_info = {'name': username,'password': password}

#Start session.
session = requests.session()

#Login using your authentication information.
session.post(url=login_url, data=login_info)

#Request page you want to scrape.
url = session.get(url=scrape_url)

soup = BeautifulSoup(url.content, 'html.parser')

for link in soup.findAll('a'):
print('\nLink href: ' + link['href'])
print('Link text: ' + link.text)

The first example prints the result:

[]

The second give me links from the login page, not links from the main scrape url

I'm really not sure what the problem is, any pointers would be greatly appreciated

Thanks

Ryan

  • Try to use beautiful soup also to submit the login information... https://stackoverflow.com/questions/23102833/how-to-scrape-a-website-which-requires-login-using-python-and-beautifulsoup also you could try using "selenium". With "selenium" you will be able to see the automated browser and what is actually going on and thus debug much more efficiently. – Ouss Jul 10 '19 at 22:37
  • Its kinda hard moving without the actual website . If you could provide the actual link , tackling the problem would be a bit easier – akin_ai Aug 11 '19 at 15:28

0 Answers0