2

I'm trying to export a CSV from this page via a python script. The complicated part is that the page opens after clicking the export button on this page, begins the download, and closes again, rather than just hosting the file somewhere static. I've tried using the Requests library, among other things, but the file it returns is empty.

Here's what I've done:

url = 'http://aws.state.ak.us/ApocReports/CampaignDisclosure/CDExpenditures.aspx?exportAll=True&amp%3bexportFormat=CSV&amp%3bisExport=True%22+id%3d%22M_C_sCDTransactions_csfFilter_ExportDialog_hlAllCSV?exportAll=True&exportFormat=CSV&isExport=True'


with open('CD_Transactions_02-27-2017.CSV', "wb") as file:
    # get request
    response = get(url)
    # write to file
    file.write(response.content)

I'm sure I'm missing something obvious, but I'm pulling my hair out.

Ben Resnik
  • 21
  • 2

2 Answers2

0

It looks like the file is being generated on demand, and the url stays only valid as long as the session lasts.

There are multiple requests from the browser to the webserver (including POST requests). So to get those files via code, you would have to simulate the browser, possibly including session state etc (and in this case also __VIEWSTATE ).

To see the whole communication, you can use developer tools in the browser (usually F12, then select NET to see the traffic), or use something like WireShark.

In other words, this won't be an easy task.

If this is open government data, it might be better to just ask that government for the data or ask for possible direct links to the (unfiltered) files (sometimes there is a public ftp server for example) - or sometimes there is an API available.

Danny_ds
  • 11,201
  • 1
  • 24
  • 46
  • Thanks, Danny. That's the answer I was afraid of. I'm gonna reach out to them and see what happens. Unfortunately this looks like a really common way states have their campaign finance sites set up. – Ben Resnik Feb 28 '17 at 17:01
0

The file is created on demand but you can download it anyway. Essentially you have to:

  1. Establish a session to save cookies and viewstate
  2. Submit a form in order to click the export button
  3. Grab the link which lies behind the popped-up csv-button
  4. Follow that link and download the file

You can find working code here (if you don't mind that it's written in R): Save response from web-scraping as csv file

Birger
  • 1,111
  • 7
  • 17