Preventing Search Engines From Crawling Your Web Pages

Matt Cutts has a good video today on Google Webmaster Central explaining how to prevent certain pages on your website from being crawled by the search engines.

You really need to be familiar with four methods of preventing the spiders from crawling your pages:

  • htaccess
  • noindex
  • nofollow
  • robots.txt
  • password protect

Your htaccess file is a ticket to solving a lot of your search engine problems. Not all of them, but some of them. It’s a file on your server that gives instructions to browsers and search engine spiders, telling them how to read your web pages. One common usage of this file is to use it to redirect old web pages to new web pages. Frequently, webmasters will update their information and when doing so will change the URL of a web page. Well, if you do that then you still have that old web page indexed and when people try to visit that page they will get a 404 error page. To prevent that from happening, you can add a 301 redirect command in your htaccess to redirect traffic to your new page.

But the htaccess has other uses as well and you can actually use it to tell the search engines certain information that will prevent them from crawling your web pages. More on this later.

Perhaps the most common way to instruct search engines not to crawl certain pages of your website is the robots.txt file. You can use this file to tell all the search engines, or just some of them, not to crawl specific pages. You just give the URLs of the pages you don’t want to be crawled and specify which search engines are not allowed to crawl those pages.

The noindex meta tag is a bit different than the robot.txt file. It tells the search engines not to show a page in their index. They’ll still crawl it, but they won’t show it in their index so anyone searching for a key term will not see that page on that search engine. Again, you can specify specific search engines or make it general for all search engines.

The nofollow meta tag is a tag that tells the search engines not to crawl certain links. So you can actually have a page that links to one other page on your website and make that link a nofollow link then the one page that spins off will not be found because of that nofollow link. You can nofollow all the links on a page or just some of them.

Finally, if you password protect certain pages, the search engines will not crawl them. They cannot guess your password so those pages are safe. Users of your website can get to them, but the search engines cannot. You can password protect your pages using the htaccess file that I discussed earlier.

Keep in mind that there are complications with each of these methods. The safest and most powerful of all of these methods is the htaccess. The least effective is the nofollow tag because while the links aren’t followed, that page is still on a server somewhere. If you access that page from your browser then move on to another page on your website and you have an analytics program that shows links for referrers, that link could get crawled and you’ll still get traffic to the page. Not a lot, but some, and you’ll run the risk of someone else linking to it. You have the same problem with noindex tags and robots.txt files, so be careful.

For more information on preventing your pages from being crawled, watch Matt Cutts’ video on that topic. He also discusses how to de-index certain URLs you have mistakenly indexed.

Posted on May 14th, 2008 in SEO Tips, Search Engine | No Comments »

Search Engine Optimization Using Three Way Links

Reciprocal links are, for some search engine optimization experts, still in the too hard basket when it comes to their effect on your rankings. Some believe it helps, if only a little while others believe they hurt and are at best, ignored and at worst, penalized by search engines. So what about three way links?

Three way links are not new, in fact they have been around for almost as long search engine optimization strategies. Whichever way you look at them, three way links are still reciprocal links and if search engines such as Google look down on them, then you can bet they know when you are using them.

A thread on Webmasterworld recently discussed the use of three way links and the responses were quite varied. One particular response appealed to me and is probably closer to my thoughts than many of the others:

Google most probably considers a certain percentage of straightforward reciprocals in your linking profile as part of a natural pattern… and good quality reciprocals, if they’re appropriate for the user and not excessive, will probably help you.

But Google is also big on “intent,” and is less tolerant about obvious attempts to game the algo. That’s what triangular links are. It would be hard to look at them any other way.

Intent is the real key when it comes to links, linking and search engine optimization. There are going to be times when it is mutually beneficial for two sites to exchange links. Certain websites will naturally complement each other, for example, a paint supply website and a paint brush website. It would be natural for these two sites to link to each other.

Triangular links would not necessarily complement each other. I am sure if I thought hard enough I could find an example, but they would be few and far between. Using triangular links as part of your search engine optimization strategy will generally be seen as an attempt to game the system.

If the search engines cannot currently detect them, it wont be long and they will. Rather than trying to work the system, you are far better off building links where they naturally occur and putting all your search engine optimization efforts into gaining one way links.

Posted on May 14th, 2008 in SEO Indexing, SEO Phenomenon, Search Engine | 1 Comment »

Google’s Top 10 Search Engine Optimization Factors

Search engine optimization hints and tips are everywhere on the net these days, perform a simple search on the topic and you will get millions of results. Every now and then you come across something that is either worth repeating, or worth commenting on. This article from Lorna Li, whilst reporting on SEOmoz’a research, is well worth a read. The content certainly is food for thought.

Lorna has listed the top 10 search engine optimization factors that Google uses (out of an estimated 200) to rank pages. I was struck by the number 1 places factor although on second thought could understand its importance. Many of the top ten involve inbound links.

It is interesting to note the many different ways inbound links are assessed by Google. It is not a simple - there is a link, it is worth xxx, move onto the next link. See for yourself the top 10 search engine optimization factors according SEOmoz:

  • Keyword use in title tag. We know that keywords are important and that putting them into the title tag was important. It seems it may be the top rated requirement for search engine optimization programs.
  • Anchor text of inbound links. Ranked at number two, the anchor text used to link to your page and how relevant it is to the page. It seems adjacent text is also checked for relevance.
  • Global link popularity of site. Whilst inbound links are important, I was surprised to see this rated at three particularly as there are still dodgy link exchanges in operation.
  • Age of site. This old monkey keeps popping up. Older sites will obviously have developed some authority, however, quality young sites still need to battle with this factor.
  • Internal link popularity. Number and quality of links to particular pages. This is one area that is under utilized on many sites and yet so easy to manipulate and it rates at number five.
  • Topical relevance of inbound links. Is the link from a well ranked topically related site?
  • Link popularity of site in topical community. “Link love from the popular hoods in the neighborhood”.
  • Keyword use in body text. All the way done at number eight. Probably expected in the top three but apparently not. Whilst it may be down at number eight, everything in the top ten is important.
  • Global link popularity of linking sites. Once again, how popular is the site linking to yours.
  • Topical relationship of linking page. Links from topically indexed pages can carry more weight.

As you can see, there are quite a few that relate to inbound links. The interesting points from this list is the relatively low position of keywords in content and the higher value of internal links, particulalry since you can play with them to get the most benefit from any search engine optimization program.

This list is not definitive and is not an ‘official Google’ list. This list has however been developed through the canvassing of 37 leading organic search engine optimization specialists. Lorna’s article is certainly well worth a read, particularly if you are new to search engine optimization.

Posted on May 14th, 2008 in SEO Tips, Search Engine | 1 Comment »

AdSense For Search Now Powered By Custom Search

If you have Adsense for Search options on your web site then you will be interested to know that Google have updated the search facility and it is now powered by Custom Search. As the name implies, you can customize many of the search features for your web site (or blog).

These features include:(courtesy of Adsense blog)

  • Site Search: you can choose to provide just site search so users can find all the information they’re looking for on your site.
  • Improved indexing of your pages:AdSense for search will now index even more pages of your site, as long as we’re able to crawl them, so that your users will see more results from your site in your AdSense for search results.
  • Vertical search:You can also allow your users to search across multiple sites - this could be a network of sites that you own or other related sites that you think your users might find useful.
  • Tuning search results and ads with keywords: Search terms can have different meanings in different contexts, so you can now configure your search engine with relevant keywords.
  • Selecting ad location: Do you want ads to appear at the top and bottom of your search results? Or along the right sidebar as well, just like on Google.com? Now you can make the call on where ads are placed.
  • Quick and easy updates: Just as you use our ad management feature to quickly change the settings for your ad and referral units, you’ll be able to do the same for your search engine within your AdSense account.

This should provide a much better system and possibly a better return for publishers using Adsense for Search. Customization is certainly an improvement on the previous search options.

Posted on May 13th, 2008 in Blogging, Earnings, Search Engine, Webmaster | No Comments »

« Previous Entries