2

I'm a beginner in python, I'm trying to get the first search result link from google which was stored inside a div with class='yuRUbf' using beautifulsoup. When I run the script output is 'None' what is the error here.

import requests
import bs4

url = 'https://www.google.com/search?q=site%3Astackoverflow.com+how+to+use+bs4+in+python&sxsrf=AOaemvKrCLt-Ji_EiPLjcEso3DVfBUmRbg%3A1630215433722&ei=CR0rYby7K7ue4-EP7pqIkAw&oq=site%3Astackoverflow.com+how+to+use+bs4+in+python&gs_lcp=Cgdnd3Mtd2l6EAM6BwgAEEcQsAM6BwgjELACECc6BQgAEM0CSgQIQRgAUMw2WPh_YLiFAWgBcAJ4AIABkAKIAd8lkgEHMC4xMC4xM5gBAKABAcgBCMABAQ&sclient=gws-wiz&ved=0ahUKEwj849XewdXyAhU7zzgGHW4NAsIQ4dUDCA8&uact=5'

request_result=requests.get( url )
soup = bs4.BeautifulSoup(request_result.text,"html.parser")
productDivs = soup.find("div", {"class": "yuRUbf"})
print(productDivs)

2 Answers2

1

As you want first google search in which class name which you are looking for might be differ with name so first you can first find manually that link so it will be easy to identify

import requests
import bs4

url = 'https://www.google.com/search?q=site%3Astackoverflow.com+how+to+use+bs4+in+python&sxsrf=AOaemvKrCLt-Ji_EiPLjcEso3DVfBUmRbg%3A1630215433722&ei=CR0rYby7K7ue4-EP7pqIkAw&oq=site%3Astackoverflow.com+how+to+use+bs4+in+python&gs_lcp=Cgdnd3Mtd2l6EAM6BwgAEEcQsAM6BwgjELACECc6BQgAEM0CSgQIQRgAUMw2WPh_YLiFAWgBcAJ4AIABkAKIAd8lkgEHMC4xMC4xM5gBAKABAcgBCMABAQ&sclient=gws-wiz&ved=0ahUKEwj849XewdXyAhU7zzgGHW4NAsIQ4dUDCA8&uact=5'

request_result=requests.get( url )

soup = bs4.BeautifulSoup(request_result.text,"html.parser")

Using select method:

  1. I have used css selector method in which it identifies all matching divs and from list i have taken from index postion 1

  2. And than i have use select_one to get a tag and find href according to it!

main_data=soup.select("div.ZINbbc.xpd.O9g5cc.uUPGi")[1:]
main_data[0].select_one("a")['href'].replace("/url?q=","")

Using find method:

main_data=soup.find_all("div",class_="ZINbbc xpd O9g5cc uUPGi")[1:]
main_data[0].find("a")['href'].replace("/url?q=","")

Output [Same for Both the Case]:

'https://stackoverflow.com/questions/23102833/how-to-scrape-a-website-which-requires-login-using-python-and-beautifulsoup&sa=U&ved=2ahUKEwjGxv2wytXyAhUprZUCHR8mBNsQFnoECAkQAQ&usg=AOvVaw280R9Wlz2mUKHFYQUOFVv8'
Bhavya Parikh
  • 3,304
  • 2
  • 9
  • 19
  • Find method works well but when I change the "url" again the output is invalid –  Aug 29 '21 at 06:58
  • beacuse the class name may be differ from that so it will not be always same class name – Bhavya Parikh Aug 29 '21 at 06:59
  • then what is solution that works with all url? when I inspect I get the same class on all page 'class="yuRUbf" –  Aug 29 '21 at 07:00
  • can you share URL again please for reference – Bhavya Parikh Aug 29 '21 at 07:02
  • here it is: https://www.google.com/search?q=site%3Astackoverflow.com+how+to+use+bs4+in+python&oq=&aqs=chrome.0.69i59i450l8.92637j0j1&sourceid=chrome&ie=UTF-8 –  Aug 29 '21 at 07:04
  • it is working i have checked with above URL – Bhavya Parikh Aug 29 '21 at 07:06
  • can you please check with this one: https://www.google.com/search?q=cars&oq=cars&aqs=chrome.0.69i59j69i61j69i60l2.3087j0j1&sourceid=chrome&ie=UTF-8 –  Aug 29 '21 at 07:08
  • 1
    Yes correct for above car URL it is identifies divs but problem is that URL containing in first index is not same as from Web page in that case you can use try and except block to first call URL from and if status_code is not 200 then you can go for next index – Bhavya Parikh Aug 29 '21 at 07:17
1

Let's see:

    from bs4 import BeautifulSoup
import requests, json

headers = {
    'User-agent':
    "useragent"
}


html = requests.get('https://www.google.com/search?q=hello', headers=headers).text
soup = BeautifulSoup(html, 'lxml')
# locating div element with a tF2Cxc class
# calling for <a> tag and then calling for 'href' attribute
link = soup.find('div', class_='tF2Cxc').a['href']
print(link)

output:

''' https://www.youtube.com/watch?v=YQHsXMglC9A

Esmaeli
  • 111
  • 6