1

I need display two different links for crawler and for guest. Example:
Crawler see normal link - <a href="example.com">example.com</a>
Guest see link - <a href="other.com/register">Register to see link</a>

// Check the user agent of the current 'visitor'
if($Detect->isCrawler()) {
    // true if crawler user agent detected
}else{
    // false
}
MESSIAH
  • 19
  • 6
  • Giving a search engine some juicy content when you'll give an ordinary user a registration page sounds like a good way to get penalised by search engines. – Quentin Jul 05 '15 at 14:09
  • Can anyone confirm this advise? Are You sure? Facebook use hiding links. – MESSIAH Jul 05 '15 at 16:03

2 Answers2

1

For the most part, you can determine if your site is being crawled by looking at the 'User-Agent' header:

function isWebCrawler() {
  $isCrawler = false;

  $userAgent = $_SERVER['HTTP_USER_AGENT'];
  if(strlen(strstr($userAgent, "Google")) <= 0 ) {
    $isCrawler = true;
  }

  return $isCrawler;
}

Here is a listing of the User Agent strings for most of the major bots: List of Web Crawler User Agents

Robert Durgin
  • 1,810
  • 19
  • 23
1

Blocking with robots.txt
Unfortunetely there isn't really a fool proof system to detect bots. Using a robots.txt file, will allow you to block most legitimate bots out there. However, there are a lot of more "aggressive" bots out there that will simply ignore this file. Also, your question is to detect them and not to block them.

Detect with User Agent
So the second best way to detect bots, would be to check the user agent using $_SERVER['HTTP_USER_AGENT']. For example:

function isCrawler($agent)
{
    $crawlers = array(
        array('Google', 'Google'),
        array('msnbot', 'MSN'),
        array('Rambler', 'Rambler'),
        array('Yahoo', 'Yahoo'),
        array('AbachoBOT', 'AbachoBOT'),
        array('accoona', 'Accoona'),
        array('AcoiRobot', 'AcoiRobot'),
        array('ASPSeek', 'ASPSeek'),
        array('CrocCrawler', 'CrocCrawler'),
        array('Dumbot', 'Dumbot'),
        array('FAST-WebCrawler', 'FAST-WebCrawler'),
        array('GeonaBot', 'GeonaBot'),
        array('Gigabot', 'Gigabot'),
        array('Lycos', 'Lycos spider'),
        array('MSRBOT', 'MSRBOT'),
        array('Scooter', 'Altavista robot'),
        array('AltaVista', 'Altavista robot'),
        array('IDBot', 'ID-Search Bot'),
        array('eStyle', 'eStyle Bot'),
        array('Scrubby', 'Scrubby robot')
    );

    foreach ($crawlers as $c)
    {
        if (stristr($agent, $c[0]))
        {
            return($c[1]);
        }
    }

    return false;
}

$crawler = isCrawler($_SERVER['HTTP_USER_AGENT']);

Using an external resource for more reliability
However, there's so many bots out there, that your array would become extremely large to catch them all. Not to mention it'll be outdated very fast. Therefor using an external resource that updates a lot is more reliable. The website UserAgentString.com is providing such a service. A simple code like this would do the trick for you:

$api_request="http://www.useragentstring.com/?uas=".urlencode($_SERVER['HTTP_USER_AGENT'])."&getJSON=all";
$ua=json_decode(file_get_contents($api_request));
if($ua["agent_type"]=="Crawler"){
    echo '<a href="example.com">example.com</a>';
} else {
    echo '<a href="other.com/register">Register to see link</a>';
}

Credits to this question: How to identify web-crawler?

Community
  • 1
  • 1
icecub
  • 8,615
  • 6
  • 41
  • 70