Our blog was recently affected by a rather clever little hack, and when I went searching for ways to remove it, I couldn’t find much. Here’s a brief writeup of what happened and how I fixed it.
Our Director of Internet Marketing Strategy, Sonny Cohen, spends some of his time searching Google and other search engines for keywords relative to our business. He began noticing that some of those results, while they would return pointers to our blog, were laced with keywords and links to various male enhancement drugs. When I searched our blog for these references, I couldn’t find anything.
Here’s what I was seeing when I would search our blog for the phrase “test”:
But here’s what Google was seeing when it did the same search:
You may notice that the URL in that is to a local file. There are two ways you can see what your site looks like to Google. One is to change the User Agent on your browser to match that of the Googlebot. The other is to use the Webmaster Tool’s “Fetch As Googlebot” lab utility. I used the latter, and saved the resulting report as an HTML file and then opened that file in Chrome.
So why is Google seeing different results than anyone else who visits my site and runs that query? Something different must be happening when Google visits. I started running through the execution path of WordPress. The first file that is accessed is index.php. All this file does is turn on a theming variable and load wp-blog-header.php. So I moved on to that file. It looked like this:
$wp_did_header = true;
require_once( dirname(__FILE__) . '/temp.php' );
require_once( dirname(__FILE__) . '/wp-load.php' );
wp();
require_once( ABSPATH . WPINC . '/template-loader.php' );
}
temp.php? Never heard of it, let’s see what’s inside:
'vVhtc9pGEP6emfwHRfUUmGLg9IbkhNrUJrZnEsfFOGmKXc1ZOoMmQqInYYea/Pfu'
.'nnjRG6aZzNRj0Em7++yzu3erOw5/fXM4HU9fvnj5Ym8cRnFnz77q9T/2+sPK2WBw'
...snip for length...
.'6reTZEAXdDrl4QNzE/3F3Wy+iKjPxFe0gH7G+ML1IiecBfHiY+LyWLhsVmDlrQ7g'
.'cvonDPkW65UOKh6zCWuM44kvFr6Ialmvw1/fHP4L'
)));
Now that looks evil. Obfuscated code can’t be good. I decided to see what it does by replacing the “eval” with “print” and then I ran “php test.php” from that directory. The results are very long, but you can see them here.
Basically, the program tries to determine if we are a real person or a search engine bot by looking at things like our IP address and our user agent. If it determines we are human, it goes ahead and returns the standard header. If we’re a bot, it serves the content in “theme.html” which is identical to the second screenshot above.
So to clean things up, I removed the reference to temp.php from wp-blog-header.php, deleted the file temp.php and deleted the file theme.html.