I'm trying to use python to scrape a webpage with a login page. I've tried using a few examples from stackoverflow that have been accepted as working examples, but none seem to work for me
Attempt 1:
import requests
from lxml import html
USERNAME = "my username"
PASSWORD = "my password"
TOKEN = "my token"
LOGIN_URL = "https://example.com/admin/login"
URL = "https://example.com/admin/tickets"
session_requests = requests.session()
# Get login csrf token
result = session_requests.get(LOGIN_URL)
tree = html.fromstring(result.text)authenticity_token = list(set(tree.xpath("//input[@name='_token']/@value")))[0]
# Create payload
payload = {
"name": USERNAME,
"password": PASSWORD,
"_token": TOKEN
}
# Perform login
result = session_requests.post(LOGIN_URL, data = payload, headers = dict(referer = LOGIN_URL))
# Scrape url
result = session_requests.get(URL, headers = dict(referer = URL))
tree = html.fromstring(result.content)
bucket_names = tree.xpath("//div[@class='a']/a/text()")
print(bucket_names)
Attempt 2:
import requests
from bs4 import BeautifulSoup
username = 'my username'
password = 'my password'
scrape_url = 'https://example.com/admin/tickets'
login_url = 'https://example.com/admin/login'
login_info = {'name': username,'password': password}
#Start session.
session = requests.session()
#Login using your authentication information.
session.post(url=login_url, data=login_info)
#Request page you want to scrape.
url = session.get(url=scrape_url)
soup = BeautifulSoup(url.content, 'html.parser')
for link in soup.findAll('a'):
print('\nLink href: ' + link['href'])
print('Link text: ' + link.text)
The first example prints the result:
[]
The second give me links from the login page, not links from the main scrape url
I'm really not sure what the problem is, any pointers would be greatly appreciated
Thanks
Ryan