0

So I'm using php loadHTMLFile() to find links on my webpage. but it always collects the links as if I am logged out, which misses a lot of log in only links I need to find. the pages already have the database session info able to be called. Is there a way to send the needed session info to the pages being crawled to find the dynamic links? I can make a false log in data if needed, I just need to be able to find the logged in only links.

Below is the code I use to locate the links:

set_error_handler (function($errno, $errstr, $errfile, $errline) {}); //Swallow unadvoidable errors caused by loadHTMLFile
$valid = $this->doc->loadHTMLFile($path); //Check to insure url is valid link.
restore_error_handler(); //Restore normal error handling

if($valid !== false)
{
$xpath = new DOMXpath($this->doc); //Create instance of DOMXpath() class. --php core--
$elements = $xpath->query("//a[not(@rel='nofollow')]/@href"); //Use $xpath to pull links from page. nofollow links are ignored.
$this_page= array(); //Insure $this_page is set, even if 0 links are found causing a false null on is_null
if (!is_null($elements)) foreach ($elements as $element) $this_page[]= $element->nodeValue; //Create a array of located links from DOMXpath object.

Anyone have any ideas? I have complete control of the site and database and this is a admin only script so user restrictions can be minimal. Thank you!

Zaper127
  • 155
  • 1
  • 2
  • 11
  • 1
    Try using cURL to send specific headers and get in return a "logged in" view of the site. – k3llydev Mar 15 '19 at 18:27
  • could you give me a example or link to a tutorial for this? Thank you @k3lly.dev – Zaper127 Mar 15 '19 at 19:02
  • Sure, it has been answered already by someone here: [Login to remote site with PHP cURL](https://stackoverflow.com/a/21170579/5568741) – k3llydev Mar 15 '19 at 19:04
  • @k3lly.dev Ok! That is going to work, probably better than the loadHTMLFile() ever did. Thank you! – Zaper127 Mar 15 '19 at 20:26

0 Answers0