-2

I am trying to get the contents of a webpage through cURL.

However, unless logged into the game Bin Weevils, the contents of the page will simply be failed=99.

I used the free Chrome extension EditThisCookie to find out which cookie out of the ones that are set when you log into Bin Weevils is required to view the page and discovered that it is the cookie PHPSESSID. I have attempted to set this cookie in the cURL header, but with no avail - failed=99 is still displayed in the output.

This is the cURL code which I am currently using:

<?php
      $ch = curl_init();
      curl_setopt($ch, CURLOPT_URL, "http://example.com");
      curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
      curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
      curl_setopt($ch, CURLOPT_HTTPHEADER, array(
         'PHPSESSID: removed for privacy'
    ));
      $output = curl_exec($ch);
      curl_close($ch); 
      echo $output;
?>

If I visit the page when not signed in to Bin Weevils and simply use EditThisCookie to set the cookie PHPSESSID, the page content shows.

2 Answers2

1

To achieve these things first you need to authenticate using user name and password via cURL.

Step:1) First login using cURL and retrieve cookies and store in a text file.

Step:2) Using that cookie you can scrape that html page.

For detail example please follow these link Scrape Page using CURL after login.

Dhaval Bhavsar
  • 495
  • 5
  • 17
0

Login to website with cURL and get secure page content

    //The username  of the account.
    define('USERNAME', 'jellylion350');

    //The password of the account.
    define('PASSWORD', '123456');

    //Set a user agent. This basically tells the server that we are using Chrome ;)
    define('USER_AGENT', 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.2309.372 Safari/537.36');

    //Where our cookie information will be stored (needed for authentication).
    define('COOKIE_FILE', 'cookie.txt');

    //URL of the login form.
    define('LOGIN_FORM_URL', 'https://play.binweevils.com/game.php');

    //Login action URL. Sometimes, this is the same URL as the login form.
    define('LOGIN_ACTION_URL', 'https://www.binweevils.com/membership/payment/login');


    //An associative array that represents the required form fields.
    //You will need to change the keys / index names to match the name of the form
    //fields.
    $postValues = array(
        'userID' => USERNAME,
        'password' => PASSWORD
    );

    //Initiate cURL.
    $curl = curl_init();

    //Set the URL that we want to send our POST request to. In this
    //case, it's the action URL of the login form.
    curl_setopt($curl, CURLOPT_URL, LOGIN_ACTION_URL);

    //Tell cURL that we want to carry out a POST request.
    curl_setopt($curl, CURLOPT_POST, true);

    //Set our post fields / date (from the array above).
    curl_setopt($curl, CURLOPT_POSTFIELDS, http_build_query($postValues));

    //We don't want any HTTPS errors.
    curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
    curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);

    //Where our cookie details are saved. This is typically required
    //for authentication, as the session ID is usually saved in the cookie file.
    curl_setopt($curl, CURLOPT_COOKIEJAR, COOKIE_FILE);

    //Sets the user agent. Some websites will attempt to block bot user agents.
    //Hence the reason I gave it a Chrome user agent.
    curl_setopt($curl, CURLOPT_USERAGENT, USER_AGENT);

    //Tells cURL to return the output once the request has been executed.
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

    //Allows us to set the referer header. In this particular case, we are 
    //fooling the server into thinking that we were referred by the login form.
    curl_setopt($curl, CURLOPT_REFERER, LOGIN_FORM_URL);

    //Do we want to follow any redirects?
    curl_setopt($curl, CURLOPT_FOLLOWLOCATION, false);

    //Execute the login request.
    curl_exec($curl);

    //Check for errors!
    if(curl_errno($curl)){
        throw new Exception(curl_error($curl));
    }

    //We should be logged in by now. Let's attempt to access a password protected page
    curl_setopt($curl, CURLOPT_URL, 'http://example.com');

    //Use the same cookie file.
    curl_setopt($curl, CURLOPT_COOKIEJAR, COOKIE_FILE);

    //Use the same user agent, just in case it is used by the server for session validation.
    curl_setopt($curl, CURLOPT_USERAGENT, USER_AGENT);

    //We don't want any HTTPS / SSL errors.
    curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
    curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);

    //Execute the GET request and print out the result.
    $result = curl_exec($curl);
     $result = urldecode($result); //decode url
     // get result as expected
      $data = explode('&',$result);
      $final_data =array();
      foreach($data as  $row){
          $temp = explode('=',$row);
          $final_data[$temp[0]] = $temp[1];
      }
      echo "<pre>";
      print_r($final_data);

Output of page is as follow

[res] => 1
[idx] => ########              //hiden for security purpose
[weevilDef] => ############### //hiden for security purpose
[level] => 70
[tycoon] => 0
[lastLog] => 1970-01-01+00%3A00%3A00
[dateJoined] => 2015-02-09+11%3A07%3A00
[x] => y

`

For detail read this

Rafiqul Islam
  • 1,636
  • 1
  • 12
  • 25