0

I need to write a piece of code that will download data files from a website which requires a log in.
I'd have thought that this would be quite easy, but I'm having difficulty with the login, programmatically.

I tried using the steps outlined in this post:
How to login and then download a file from aspx web pages with R

But when i get to the second from last step in the top answer I get an error message:
Error: Internal Server Error

So I am trying to write an RCurl code to login to the site, then download the files. Here is what I have tried:

install.packages("RCurl")
library(RCurl)

curl = getCurlHandle()
curlSetOpt(cookiejar = 'cookies.txt', .opts = list(ssl.verifypeer = FALSE),        followlocation = TRUE, autoreferer = TRUE, curl= curl)

html <- getURL('https://research.valueline.com/secure/f2/export?params=[{appId:%27com_2_4%27,%20context:{%22Symbol%22:%22GT%22,%22ListId%22:%22recent%22}}]', curl = curl)
viewstate <- as.character(sub('.*id="__VIEWSTATE" value="([0-9a-zA-Z+/=]*).*', '\\1', html))

params <- list(
'ctl00$ContentPlaceHolder$LoginControl$txtUserID' = '<myusername>',
    'ctl00$ContentPlaceHolder$LoginControl$txtUserPw'  = '<mypassword>',
    'ctl00$ContentPlaceHolder$LoginControl$btnLogin' = 'Sign In',
    '__VIEWSTATE' = viewstate
    )

html = postForm('https://research.valueline.com/secure/f2/export?params=[{appId:%27com_2_4%27,%20context:{%22Symbol%22:%22GT%22,%22ListId%22:%22recent%22}}]', .params = params, curl = curl)

grepl('Logout', html)
Community
  • 1
  • 1
Jack Rob
  • 73
  • 2
  • 8
  • Try `html = postForm('https://research.valueline.com/secure/f2/export?params=[{appId:%27com_2_4%27,%20context:{%22Symbol%22:%22GT%22,%22ListId%22:%22recent%22}}]', .params = params, curl = curl,style="POST")` – user227710 Jun 09 '15 at 19:40
  • Does the website require a `User-Agent` field? Perhaps the website doesn't like "anonymous browsers". – r2evans Jun 09 '15 at 19:41
  • Have you verified this works on the (non-*R*) command-line? – r2evans Jun 09 '15 at 19:43
  • @r2evans can you explain what you mean about the (non-R) command line? I'm extremely new to using R for web related tasks, mostly just econometrics. – Jack Rob Jun 09 '15 at 19:45
  • @user227710 still getting internal server error – Jack Rob Jun 09 '15 at 19:46
  • @JackRob: I mean the black-screen-of-death, i.e., `cmd.exe` or `bash`, depending on your OS and such. If you can do it on the command-line, then it is indeed an *R* problem. If the command-line also has problems, then you will not be able to fix it with `RCurl` alone. – r2evans Jun 09 '15 at 19:49
  • @r2evans how can i check this? im using a windows computer. i cant harm this computer though, so maybe not a good idea? – Jack Rob Jun 09 '15 at 19:51
  • Not sure whether this has to do with `https` . If that is the case, you may want to try [curl](http://www.r-bloggers.com/the-curl-package-a-modern-r-interface-to-libcurl/) package. – user227710 Jun 09 '15 at 20:00
  • If you do not have curl installed locally, you'll need to [download it](http://curl.haxx.se/dlwiz/?type=bin). Once installed, run `cmd.exe` and try `curl.exe`. If it works, you may find it easier to copy the string from *R* and right-click-paste into the terminal. You will likely need to enclose the URL-and-arguments in quotes. No computers were harmed in the making of this comment. ;-) – r2evans Jun 09 '15 at 20:04
  • @user227710, do you have any experience with `httr` versus `curl` versus `RCurl`? Perhaps that's one way to go – r2evans Jun 09 '15 at 20:04
  • The link describes the difference. I haven't used `curl` yet. – user227710 Jun 09 '15 at 20:07
  • still have no answer for this, could really use some help with the code. – Jack Rob Jun 09 '15 at 22:24

0 Answers0