4

I am trying to write a code that will allow me to download a .xls file from a secured https website which requires a login. This is very difficult for me, as i have no experience with web-coding--all my R experience comes from econometric work with readily available datasets.

i followed this thread to help write some code, but i think im running into trouble because the example is http, and i need https.

this is my code:

install.packages("RCurl")
library(RCurl)

curl = getCurlHandle()
curlSetOpt(cookiejar = 'cookies.txt', followlocation = TRUE, autoreferer =  TRUE, curl = curl)

html <- getURL('https://jump.valueline.com/login.aspx', curl = curl)

viewstate <- as.character(sub('.*id="_VIEWSTATE" value="([0-9a-zA-Z+/=]*).*', '\\1', html))

params <- list(
    'ct100$ContentPlaceHolder$LoginControl$txtUserID' = 'MY USERNAME',
    'ct100$ContentPlaceHolder$LoginControl$txtUserPw' = 'MY PASSWORD',
    'ct100$ContentPlaceHolder$LoginControl$btnLogin' = 'Sign In',
    '_VIEWSTATE' = viewstate)

html <- postForm('https://jump.valueline.com/login.aspx', .params = params, curl = curl)

when i get to running the piece that starts "html <- getURL(..." i get:

> html <- getURL('https://jump.valueline.com/login.aspx', curl = curl)
Error in function (type, msg, asError = TRUE)  : 
SSL certificate problem: unable to get local issuer certificate

is there a workaround for this? how am i able to access the local issuer certificate?

I read that adding '.opts = list(ssl.verifypeer = FALSE)' into the curlSetOpt would remedy this, but when i add that, the getURL runs, but then postForm line gives me

> html <- postForm('https://jump.valueline.com/login.aspx', .params = params, curl = curl)
Error: Internal Server Error

Besides that, does this code look like it will work given the website i am trying to access? I went into the inspector, and changed all the params to be correct for my webpage, but since i'm not well versed in webcoding i'm not 100% i caught the right things (particularly the VIEWSTATE). Also, is there a better, more efficient way i could approach this?

automating this process would be huge for me, so your help is greatly appreciated.

Community
  • 1
  • 1
Jack Rob
  • 73
  • 2
  • 8

2 Answers2

3

Try httr:

library(httr)
html <- content(GET('https://jump.valueline.com/login.aspx'), "text")

viewstate <- as.character(sub('.*id="_VIEWSTATE" value="([0-9a-zA-Z+/=]*).*', '\\1', html))

params <- list(
  'ct100$ContentPlaceHolder$LoginControl$txtUserID' = 'MY USERNAME',
  'ct100$ContentPlaceHolder$LoginControl$txtUserPw' = 'MY PASSWORD',
  'ct100$ContentPlaceHolder$LoginControl$btnLogin' = 'Sign In',
  '_VIEWSTATE' = viewstate
)
POST('https://jump.valueline.com/login.aspx', body = params)

That still gives me a server error, but that's probably because you're not sending the right fields in the body.

hadley
  • 102,019
  • 32
  • 183
  • 245
  • 2
    sending the right fields in the body of what? you mean in the params list? how would i find out the correct fields? i used inspector on chrome to see what they'd be called, but idk if theyre completely correctly used. – Jack Rob Jun 23 '15 at 22:40
-1
html <- getURL('https://jump.valueline.com/login.aspx', curl = curl, ssl.verifypeer = FALSE)

This should work for you. The error you're getting is probably because libcurl doesn't know where to look for to get a certificate for SSL.

Ken Yeoh
  • 876
  • 6
  • 11
  • thank you for the response, but still getting the 'Error: Internal Server Error' message at the last line with postForm html <- getURL('https://jump.valueline.com/login.aspx', curl = curl, ssl.verifypeer = FALSE) and everything else the same gets me the 'Internal Server Error' Message at the end – Jack Rob Jun 23 '15 at 19:21
  • 2
    _Never_ set `ssl.verifypeer = FALSE` - it basically undoes all the good of using SSL in the first place. – hadley Jun 23 '15 at 22:00