1

First of all I'm not a python expert. I'm learning python to scrape data from this specific game website. I'm trying to scrape data from a website that need login. You won't see data unless you login to this website.(I have attached a screenshot of page you will see for above website once you login) I tried to run the following code:

import requests
from bs4 import BeautifulSoup

page = requests.get('<website url>')
soup = BeautifulSoup(page.content, 'html.parser')
print(soup)

Here, I'm getting same result as if i'm not logged in to this website. Can someone guide me what I need to do?

enter image description here

Jnana
  • 11
  • 3

1 Answers1

1

You can use requests.session() to login and then make next requests.

For example:

import requests
from bs4 import BeautifulSoup

data = {'lEmail': '<YOUR EMAIL HERE>',
        'lPass': '<YOUR PASSWORD HERE>',
        'fbSig': 'web'}

url = 'https://www.airline4.net/research_main.php?mode=search&rwy=1000&dist=25000&depId=3982&arr=0&arrId=0&fbSig=false'
login_url = 'https://www.airline4.net/weblogin/login.php'

with requests.session() as s:
    s.post(login_url, data=data).text

    # now you are logged in, just print some information:
    soup = BeautifulSoup(s.get(url).content, 'html.parser')
    print(soup.get_text(strip=True, separator='\n'))

Prints:

Distance
Y class
J class
F class
Rwy
OPIS
-
SCIP
Pakistan, Islamabad
-
Chile, Isla De Pascua
19,273 km
10,827ft rwy
Market:
55%
Y class
473
J class
221
F class
129
OPIS
-
NTGJ
Pakistan, Islamabad
-
French Polynesia, Totegegie
17,075 km
6,562ft rwy
Market:
67%
Y class
286
J class
161
F class
21
OPIS
-

... and so on.
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • Thanks, An additional question; Is there any way we can use the output that we print as input. Here, in `url='https://www.airline4.net/research_main.php?mode=search&rwy=1000&dist=25000&depId=3982&arr=0&arrId=0&fbSig=false'` dist=25000 refers to distance of 25000km. Print gave first 50 routes in descending order from 25000km and the last(50th) result was for 16251km. So now I want to create a loop that uses this 16251km in **url** and so on. I can make this possible by `distance = input()` and changing `url` But is there a way it can auto select that 16251km instead of `input()` – Jnana May 30 '20 at 15:31
  • @Jnana Yes, you can use BeautifulSoup to parse the last result and then construct the url with this value. That's what BeautifulSoup is for. Here in my example I used it only to parse text. – Andrej Kesely May 30 '20 at 16:03