0

What I'm trying to do is scrape the web page 'http://www.trulia.com/property/1080560259-2-Penelope-Ln-Middletown-NJ-07748'. In this when the tab Estimates (below Comparable and Estimates section) is selected the data below the google map is loaded dynamically. This data is not visible in page source, but at the same time it is visible in Developer Tools window (context menu, Inspect Element).

I'm using Selenium and Python 2.7. Is there a way to access this data? or is there any way to access all the elements?

Thanks in advance.

a coder
  • 546
  • 4
  • 23
Mohan Raj
  • 21
  • 9
  • See my answer to larger scope question, start from latest code listing and see `browser.page_source`. The answer is http://stackoverflow.com/questions/23386855/login-navigate-and-retrieve-data-behind-a-proxy-with-python/23547507#23547507 – Jan Vlcinsky May 08 '14 at 22:11
  • Thanks. But this doesnt resolve my issue. Is there a way to access Elements listed in Dev Tools window. The dynamic data generated is not visible in page source. I couldnt use response package since i dont have a new URL. By default Tab 1 (Comparable) data comes in the source. I need tab 2 (Estimates) table data. – Mohan Raj May 08 '14 at 22:55
  • The data I need is visible in the Elements section of Dev Tools window but not in the source. – Mohan Raj May 08 '14 at 22:56

1 Answers1

2

Since that is powered by ajax, you need to account for that yourself.

I'd do something like: (and this is pseudo-code)

find_element_by_css_selector('a#dataset_nearby').click()
waitForElement('ul#places_map_module li.active table.table tr')

You'll probably need to fiddle around with the selectors, but in waitForElement, basically you just need to do a constant check on the element and wait until it's available BEFORE you perform a command on it.

ddavison
  • 28,221
  • 15
  • 85
  • 110
  • Hi, thanks for the response. But even when I wait the element is not visible. find below the code section I tried. import selenium.webdriver.support.ui as ui wait = ui.WebDriverWait(driver,30) wait.until(lambda driver:driver.find_element_by_css_selector('a#dataset_nearby')) driver.find_element_by_css_selector('a#dataset_nearby').click() ElementNotVisibleException is thrown. – Mohan Raj May 08 '14 at 22:47
  • use `find_elements` instead, and check the `length`.. that might help too – ddavison May 08 '14 at 22:48
  • Am using find_element_by_id and click() method is called to select the Tab Estimates in the web page. Even with wait time the new data is not available to the browser handler. It throws the same exception. The code I tried is, pick_id = driver.find_element_by_id("dataset_nearby") pick_id.click() wait.until(lambda driver: driver.find_elements_by_css_selector('Home Estimates')) print driver.find_elements_by_css_selector('Home Estimates') – Mohan Raj May 08 '14 at 23:20
  • whenever the Tab is clicked, the following request is sent. GET /_ajax/PDP/NearbyProperties/json/?tplname=small&bo...4&lon=-74.10724&block_pid=1080560259&fips_id=34025. How can I make equivalent request in Python? – Mohan Raj May 09 '14 at 00:22