3

I'm trying to scrape information from my companies Intranet so that I can display information on our office wall board via dashing dashboard. I'm trying to work with the provided information from:This Site.The problem that I'm having other than being a noob is that in order to gain access to the information I want to scrape, I need to login to our Intranet providing my username on one page then submitting to another so that I can provide my password. Once I'm logged in, I can then link and scrape my data.

Here is some source code from my login username page:

<form action='loginauthpwd.asp?PassedURL=' method='post' style='margin: 0px;'><table border='0' cellspacing='1' width='999' height='350'><tr><td width='100'>&nbsp;</td><td valign='center' width='100'><table style='width: 350px; background-color: #EEEEEE; border: 1px solid gray;'><tr><td class='fontBlack' style='padding: 10px; vertical-align: top;'><span style='font-weight: bold;'>Username:</span><br><input type='text' class='normal' autocomplete='off' id='LoginUser' name='LoginUser' style='border: 1px solid gray; height: 16px; font-family: arial; font-size: 11; width: 180px;' maxlength='30'><input class='normal_button' type='button' value='Go' style='border: 1px solid gray; font-weight: bold; width: 80px; margin-left: 10px;' onclick="var username=document.getElementById('LoginUser').value; if (username.length > 2) { submit(); } else { alert('Enter your Username.'); }"></form>

Here is some source from my login password page:

<form action='loginauthprocess.asp?UserName=******&Page=&PassedURL=' target='_top' method='post' onsubmit='checkMyBrowser();' style='margin: 0px;'><table border='0' cellspacing='1' width='999' height='350'><tr><td width='100'>&nbsp;</td><td valign='center' width='100'><table style='width: 350px; background-color: #EEEEEE; border: 1px solid gray;'><tr><td class='fontBlack' style='padding: 10px; vertical-align: top;'><span style='font-weight: bold;'>Password:</span><br><input class='normal' type='password' autocomplete='off' id='LoginPassword' name='LoginPassword' style='border: 1px solid gray; height: 16px; font-family: arial; font-size: 11; width: 180px;' maxlength='30'><input class='normal_button' type='submit' value='Log In' style='border: 1px solid gray; font-weight: bold; width: 80px; margin-left: 10px;' onclick="var password=document.getElementById('LoginPassword').value; if (password.length > 2) { submit(); } else { alert('Enter your Password.'); }"></form>

Using said resource's example this is what I think should work but doesn't seem to be:

require 'mechanize'
@agent = Mechanize.new
@agent.verify_mode = OpenSSL::SSL::VERIFY_NONE

##Login Page:
page = @agent.get 'http://www.website_here.com/intranet/login.asp'

##Username Page:
form = page.forms[0]
form['USER NAME HERE'] = LoginUser
##Submit User:
page = form.submit

##Password Page:
form = page.forms[0]
form['USER PASSWORD HERE'] = LoginPassword
##Submit Password:
page = form.submit

When I test my code I get the following output:

test.rb:10:in `': uninitialized constant LoginUser (NameError)

Can anyone point out what I'm doing wrong?

Thanks

EDIT 3/27/15:

Using @seoyoochan resource I tried to form my code like this:

require 'rubygems'
require 'mechanize'
login_page  = agent.get "http://www.website_here.com/intranet/loginauthusr.asp?Page="
login_form = login_page.form_with(action: '/sessions') 
user_field = login_form.field_with(name: "session[user]") 
user.value = 'My User Name'

login_form.submit

When I try to run my code I'm now getting this output:

test.rb:4:in <main>': undefined local variable or methodagent' for main:Object (NameError)

I'm needing an example on how to assign the right names/classes that my provided form will work with.

EDIT 4/4/15:

Okay, Now using @tylermauthe example I'm trying to test the following code:

require 'mechanize'
require 'io/console'

agent = Mechanize.new
page = agent.get('http://www.website_here.com/intranet/loginauthusr.asp?Page=')

form = page.forms.find{|form| form.action.include?("loginauthpwd.asp?PassedURL=")}

puts "Login:"
form.login = gets.chomp
page = agent.submit(form)
pp page

Now my thoughts are that this code should allow me to enter and submit my username bringing me to my next page that would ask for my password. BUT, when I try to run it and enter my username, I get the following output:

/var/lib/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize/form.rb:217:in method_missing': undefined methodloginUser=' for # (NoMethodError) from scraper.rb:10:in `'

What am I missing or have entered wrong? Please refer to my first edit to see how my form is coded. Also to be clear I did not code the forms this way. I'm only trying to learn how to code and scrape data needed to display on my Dashing Dashboard project.

rleigh
  • 61
  • 1
  • 8
  • You've posted 2 implementations now, both with different problems. Which implementation do you prefer to go with? My suggestion would be the solution suggested by @seoyoochan. – tylermauthe Apr 03 '15 at 17:15
  • @tylermauthe To be honest I don't care as long as It will login and scrape the data and allow me to display the output on my widget in Dashing Dashboard. – rleigh Apr 03 '15 at 17:21
  • After reading my above comment back I can see it being taken out of context. To be clear...I new to coding and would just like a good example that will get me logged in automatically by script/code so that I can try and get it to display on my widget. Thanks for any help! – rleigh Apr 03 '15 at 17:41

3 Answers3

3

I was able to get logged in with the following example. Thanks to everyone that helped me with all the resources and examples to learn from!

require 'nokogiri'
require 'mechanize'

agent = Mechanize.new

# Below opens URL requesting username and finds first field and fills in form then submits page.

login = agent.get('http://www.website_here.com')
login_form = login.forms.first
username_field = login_form.field_with(:name => "user_session[username]")
username_field = "YOUR USERNAME HERE"
page = agent.submit login_form

# Below opens URL requesting password and finds first field and fills in form then submits page.

login = agent.get('http://www.website_here.com')
login_form = login.forms.first
password_field = login_form.field_with(:name => "user_session[password]")
password_field = "YOUR PASSWORD HERE"
page = agent.submit login_form

# Below will print page showing information confirming that you have logged in.

pp page

I found the following example from user:Senthess HERE. I'm still not 100% on what all the individual code is doing so if anyone would like to take the time and break it down, please do so. This will help myself and others to better understand.

Thanks!

Community
  • 1
  • 1
rleigh
  • 61
  • 1
  • 8
1

I just looked up about Mechanize gem and found a relevant solution. You must set a proper 'name' on input fields. Otherwise you can't accept values from them. Follow this article.

http://crabonature.pl/posts/23-automation-with-mechanize-and-ruby

seoyoochan
  • 822
  • 3
  • 14
  • 28
  • Thanks, I'll take a look at it when I get to the office today. – rleigh Mar 27 '15 at 11:59
  • yep, let me know how it goes. and I see some potential security lacks on the form. don't put javascript code directly on the form. and make use of token authentication system by implementing it properly. – seoyoochan Mar 27 '15 at 12:01
  • Okay, I'm just too new at this to understand what is needing to match up within the given examples to get something to work. I have been playing around a bit and can't seem to get my first username page to provide my name in the form and submit to the next page that would ask for my password. – rleigh Mar 27 '15 at 15:45
  • If it helps some to provide an example I was able to learn that I can do a "pp page" in Mechanize that gives the following output: {forms #} This tells me that my form I'm trying to access has no name right? – rleigh Mar 27 '15 at 15:50
  • Did it cause the same error? What is the rest of logs? – seoyoochan Mar 27 '15 at 16:06
  • @rleigh You didn't initialize 'agent' object. try 'agent = Mechanize.new' – seoyoochan Mar 27 '15 at 18:46
  • Its funny you pointed that out. I was just fixing and testing it again. This time I got the following output: "test.rb:7:in `
    ': undefined method `field_with' for nil:NilClass (NoMethodError)"
    – rleigh Mar 27 '15 at 18:54
  • That means u login_form is nil. Try to figure out about finding the form. I can't help you further since I do not know the url u r accessing to. – seoyoochan Mar 27 '15 at 18:59
  • I understand but can't provide the url for security reasons. I will keep looking into it and see if I can get any guidance from our IT department. If I get anywhere, I will update my post. – rleigh Mar 27 '15 at 19:12
  • that'd be great. please update your post if you find any solution in the future for other people's references. :) – seoyoochan Mar 27 '15 at 20:08
0

Not sure if you found these, but Mechanize has fairly excellent docs: http://docs.seattlerb.org/mechanize/GUIDE_rdoc.html

From these, I played around in the irb REPL to create this simple scraper that logs into GitHub: https://gist.github.com/tylermauthe/781f68add24819e207c4

tylermauthe
  • 488
  • 4
  • 13
  • Thanks and I'll look it over and let you know how it turns out. – rleigh Apr 03 '15 at 20:50
  • Ok, I used your example in my environment and was able to login to Git. With that being said, Git has the User & Password fields on the same page. In my case; I need to be able to first provide a username/submit page then provide password/submit page again. At this point I will be logged in to my intranet so that I can scrape my data. I do believe you got me one step closer with your "form.action" example. My form is nil as well. So now the trick is being able to navigate both username and password pages. – rleigh Apr 03 '15 at 21:48
  • I was wondering why the password field wasn't in the hash when you did pp page... – tylermauthe Apr 03 '15 at 23:49
  • You should be able to figure this out based on the Gist I provided. You'll need to re-arrange code, so that you don't try to fill the password field initially and so that you do fill the password on the form in the second page and then submit that second form... – tylermauthe Apr 03 '15 at 23:54
  • Please see latest edit above. Thanks for all the help so far and pointers that might show what I'm still doing wrong. – rleigh Apr 05 '15 at 04:12
  • Try using IRB. It is an interactive REPL that allows you to build code. The error you are getting is saying that the form object you've selected with find doesn't have a loginUser property. So either the field is not named loginUser or you haven't found the correct form. Using IRB, you can play around with the code until it works the way you are hoping. – tylermauthe Apr 05 '15 at 08:13
  • I provided a link above to instructions on using IRB, but here's another: https://www.ruby-lang.org/en/documentation/quickstart/ – tylermauthe Apr 05 '15 at 08:14