2

I'm using Anemone gem in following way:

  • Visit first url (seed), save page content to database and save all links from this page to database as well (all links which are not in database yet)
  • Load next link from database, save its content and any other links again
  • If there is no other link, crawl all links again (after some time period) to overwrite old content by new

This works pretty well but is there any possibility how to crawl pages which requires login (if I know username and password)? I know Mechanize gem which provide functionality to fill in forms but I don't know how to integrate it in my process (if it is possible). Or is there any other way how to crawl pages "behind" login form?

kmaci
  • 3,094
  • 2
  • 18
  • 28

2 Answers2

0

In your case I suggest use one of these solutions:

Because this two solution allow you fill forms and click to web elements and to do anything that default internet user. This is not possible in case Mechanize gem.

stepozer
  • 1,143
  • 1
  • 10
  • 22
0

You can use mechanize to automate the login process then keep its session to do next what you want.

Here is my sample code:

require 'mechanize'

module YourModuleName
  class YourClassName
      attr_reader :agent

      def call
        @agent = Mechanize.new
        page = @agent.get(@login_page)
        form = page.forms.first
        form.field_with(id: LoginConstant::CSS[:user_email]).value = 
        LoginConstant::USER_NAME
        form.field_with(id: LoginConstant::CSS[:user_password]).value = LoginConstant::PASSWORD
        form.submit
        self
      end
  end
end

Then in your code to crawler a page that request login same below:

response = YourModuleName::YourClassName.new('<your_login_page>').call
response.agent.get('<your_page_to_want_cralwer>')
Hai Nguyen
  • 458
  • 9
  • 15