Sunday, 31 August 2008

Googlebot, fdfdkll.html and Configuring 404 Page Not Found with PHP

After checking the logs for a website, I noticed Googlebot trying to access fdfdkll.html, which definately does not exist.

I assumed Googlebot hadn't gone mad so I did some investigations and found this is Googlebot trying to establish how the site handles invalid urls.

Thinking from the search engine perspective, this is important, as if a default page is served with a status of 200, page found, the invalid url would be listed.

This is exactly what I had done. Ooops. I had set htaccess to display the homepage if the requested page didn't exist. The site is small so a specifc error page is not really needed.

It is most likely this would effect the page rank so I really needed to change the return status to 404 if the page has been reached due to an invalid url.

The homepage is PHP, so I made the following changes:

Initial .htaccess extract

ErrorDocument 404 /home.php

I changed this to

ErrorDocument 404 /home.php?error=true

And added the following code to the start of the PHP homepage.

header("HTTP/1.0 404 Not Found");

The header command needs to come before any output is written to the page.

So now if the homepage is accessed because the url in invalid, the error parameter is set and the status returned is 404, otherwise it is 200 as normal. From the user perspective, there is no difference, the homepage is displayed as normal.

fdfdkll.html ...Done.

No comments: