Less 404 errors by adding robots.txt to your site
Even if you don’t think you need it, it’s still good practice to provide a “robots.txt” file in the root directory of your site for search engine spiders to find. Not only will it remove the 404s from your error_log (happens every time a spider/bot looks for it and it doesn’t exist), but it also provides a quick and efficient way to block certain sections of your site from being indexed. This is by far a better method than adding rel=”nofollow” to your links or the following meta tag to the header of each page in question.
<meta name="robots" content="noindex, nofollow" />
The most basic robots.txt file would include the following. This tells the search engines to index everything it can find.
Force “www” subdomain using .htaccess
Short and sweet.
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\.
RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]
</IfModule>
And if you want to remove the “www” portion, just change the condition (RewriteCond) and rule (RewriteRule)…
You may ask “Why should I do this?” and the asnwers are simple. It promotes linking uniformity and ensures you don’t end up being negatively affected by having “duplicate content” across (sub)domains in search engine indexes.
