1

I need to login to malwr site through python script I tried with various modules like machanize module,request module, however no success to login to site using scrpt.

I want to create automation script to download files from malware analysis site by parsing html page, but due to login issue I am not able to parse href attribute of html page to get links to download file.

Below is my code:

import urllib, urllib2, cookielib

username = 'myuser'
password = 'mypassword'

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'password' : password})
opener.open('https://malwr.com/account/login/', login_data)
resp = opener.open('https://malwr.com/analysis/MDMxMmY0NjMzNjYyNDIyNDkzZTllOGVkOTc5ZTQ5NWU/')
print resp.read()

am I doing somthing wrong?

Community
  • 1
  • 1
Rajendra
  • 373
  • 1
  • 2
  • 18

1 Answers1

2

The key thing to do is to parse the csrf token from the form and to pass it alongside with username and password in POST parameters to the https://malwr.com/account/login/ endpoint.

Here is the solution using requests and BeautifulSoup libraries.

First, it opens up a session to maintain cookies for "staying logged in" through the web-scraping session, then it is getting a csrf token from the login page. The next step is sending a POST request to log in. Then, you can open up "analysis" pages and retrieve the links:

from urlparse import urljoin
from bs4 import BeautifulSoup
import requests

base_url = 'https://malwr.com/'
url = 'https://malwr.com/account/login/'
username = 'username'
password = 'password'

session = requests.Session()

# getting csrf value
response = session.get(url)
soup = BeautifulSoup(response.content)

form = soup.form
csrf = form.find('input', attrs={'name': 'csrfmiddlewaretoken'}).get('value')

# logging in
data = {
    'username': username,
    'password': password,
    'csrfmiddlewaretoken': csrf
}
session.post(url, data=data)

# getting analysis data
response = session.get('https://malwr.com/analysis/MDMxMmY0NjMzNjYyNDIyNDkzZTllOGVkOTc5ZTQ5NWU/')
soup = BeautifulSoup(response.content)

link = soup.find('section', id='file').find('table')('tr')[-1].a.get('href')
link = urljoin(base_url, link)
print link

Prints:

https://malwr.com/analysis/file/MDMxMmY0NjMzNjYyNDIyNDkzZTllOGVkOTc5ZTQ5NWU/sample/7fe8157c0aa251b37713cf2dc0213a3ca99551e41fb9741598eb75c294d1537c/
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Thanks @alecxe , I was missing to find crftoken – Rajendra Dec 30 '14 at 06:44
  • I am trying to download output link through urlopen() , however showing error access forbidden. Looks like session expiring before downloading the file.Any suggession how to download file from above link using same session – Rajendra Dec 30 '14 at 09:59
  • @Rajendra you need to use the same session to download a file, use the answer(s) provided here http://stackoverflow.com/questions/16694907/how-to-download-large-file-in-python-with-requests-py. Let me know if you have difficulties. Thanks. – alecxe Dec 30 '14 at 17:15
  • @ http://stackoverflow.com/users/771848/alecxe could you please look into this question http://stackoverflow.com/questions/29074052/how-to-pass-search-key-and-get-result-through-bs4 – Rajendra Mar 16 '15 at 14:51