0

I am trying to scrape my data from a website that requires a login but I keep getting the following error:

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>MethodNotAllowed</Code><Message>The specified method is not allowed against this resource.</Message><Method>POST</Method><ResourceType>OBJECT</ResourceType><RequestId>DCVJZ8D4R3PK45M1</RequestId><HostId>PIra5vNbfC5d1TfFZ3hABXk9eIsKwtJm5bYH4Bozu4nS4InkGEILNflPPzdvT9hUpQOPaW0AZBA=</HostId></Error>

Python Script

import requests


loginurl = ("https://cbscarrickonsuir.app.vsware.ie/")
secure_url = ("https://cbscarrickonsuir.app.vsware.ie/11571471/behaviour")
payload = {"username":"REMOVED","password":"REMOVED","source":"web"}
r = requests.post(loginurl, data=payload)
print(r.text)

Had to remove username and password as this is a working website. I don't know how to do this. I followed a youtube tutorial but he had a much easier website to scrape from. I hope you can help me.

GCIreland
  • 145
  • 1
  • 16
  • sometimes it is better to use `Session()` to work with `cookies` and first use `GET` to get all cookies from server (especially cookies for `Session ID`). And later run `POST`. Some pages may need to copy some extra value from HTML which you get with `GET` (ie. unique session ID) – furas Nov 28 '21 at 03:57
  • the main problem can be that this page uses `JavaScript` to add elements in HTML - so it may also uses JavaScript also to detect scripts/bots - but `requests` can't run `JavaScript`. it may need to use [Selenium](https://selenium-python.readthedocs.io/) to control real web browser which can run `JavaScript` – furas Nov 28 '21 at 04:01
  • some servers can check hearders - escpecially `User-Agent` – furas Nov 28 '21 at 04:04
  • login form doesn't have to send it to the same URL - and this page sends data to `https://cbscarrickonsuir.vsware.ie/tokenapiV2/login` as you have in answer. – furas Nov 28 '21 at 04:05

2 Answers2

0

Open the network tab of your browser, use the login form after typing some username and password and you can see what endpoint is used for login. In your case it is https://cbscarrickonsuir.vsware.ie/tokenapiV2/login

Request Headers

Example request body

It would be a good idea to click through links in XHR part of Network tab and see the headers, request and response to understand what API endpoint exactly you should be using along with the method, the request body format which is expected and the kind of response you will receive.

Edit: Also you'll be probably needing persistent sessions for scraping any data which will require you to login first. Go through these:

  1. Python Requests and persistent sessions
  2. https://requests.kennethreitz.org/en/master/user/advanced/#session-objects
Shubham Dhingra
  • 186
  • 3
  • 13
0

There are two mistakes in your code.

  1. you send data to main page but browser send to https://cbscarrickonsuir.vsware.ie/tokenapiV2/login

  2. you send data as FORM data but browser sends as JSON data so you need json=payload instead of data=payload

Other problem can make that you don't use Session() to send automatically cookies - and all servers use cookies to keep information that you already logged in. If you don't send cookies then server doesn't know that you are logged in.

import requests

url = "https://cbscarrickonsuir.app.vsware.ie/"

login_url = 'https://cbscarrickonsuir.vsware.ie/tokenapiV2/login'

payload = {
    "username": "none",
    "password": "none@none.com",
    "source":"web"
}

s = requests.Session()

r = s.post(login_url, json=payload)

print('status:', r.status_code)
print('--- text ---')
print(r.text)
print('----------------')

I don't have account to login but now it get status 401 with message invalid_username_password

status: 401
--- text ---
{"fieldErrors":[],"genericErrors":[{"messageKey":"invalid_username_password","metadata":null}]}
furas
  • 134,197
  • 12
  • 106
  • 148