Search Engines Inside Stories
Let's recap the basic methods of steering and supporting search engine crawling and ranking:
# Provide unique content. A lot of unique content. Add fresh content frequently.
# Acquire valuable inbound links from related pages on foreign servers, regardless of their search engine ranking. Actively acquire deep inbound links to content pages, but accept home page links. Do not run massive link campaigns if your site is rather new. Let the amount of relevant inbound links grow smoothly and steadily to avoid red-flagging.
# Put in carefully selected outbound links to on-topic authority pages on each content page. Ask for reciprocal links, but do not dump your links if the other site does not link back.
# Implement a surfer friendly, themed navigation. Go for text links to support deep crawling. Provide each page at least one internal link from a static page, for example from a site map page.
# Encourage other sites to make use of your RSS feeds and alike. To protect the uniqueness of your site's content, do not put text snippets from your site into feeds or submitted articles. Write short summaries instead and use a different wording.
# Use search engine friendly, short but keyword rich URLs. Hide user tracking from search engine crawlers.
# Log each crawler visit and keep these data forever. Develop smart reports querying your logs and study them frequently. Use these logs to improve your internal linking.
# Make use of the robots exclusion protocol to keep spiders away from internal areas. Do not try to hide your CSS files from robots.
# Make use of the robots META tag to ensure that only one version of each page on your server gets indexed. When it comes to pages carrying partial content of other pages, make your decision based on common sense, not on any SEO bible.
# Use rel="nofollow" in your links, when you cannot vote for the linked page (user submitted content in guestbooks, blogs ...).
# Make use of Google SiteMaps as a 'robots inclusion protocol'.
# Do not cheat the search engines.
http://www.yourdomain.com/robots.txt behaves like any other PHP script. Your file system's directory structure has nothing to do with your linking structure, that is your site's hierarchy. However, you can store scripts delivering content which is not subject of public access in directories protected by robots.txt. To prevent this content from all unwanted views, add user/password protection.
"*" matches any sequence of characters, "$" indicates the end of a name.
The first example would disallow all dynamic URLs were the variable 'affid' (affiliate ID) is part of the query string. The second and third example disallow URLs containing a session ID or a visitor ID. The fourth example excludes .aspx page scripts without a query string from crawling. The fifth example tells Google's image crawler to fetch all image formats except .gif files. Because not all Web robots understand this syntax, it makes sound sense to put in a URL Specific Control: the Robots META Tag.
INDEX|NOINDEX - Tells the SE spider whether the page may be indexed or not
FOLLOW|NOFOLLOW - Tells the SE crawler whether it may follow links provided on the page or not
ALL|NONE - ALL = INDEX, FOLLOW (default), NONE = NOINDEX, NOFOLLOW
NOODP - tells search engines not to use page titles and descriptions from the ODP on their SERPs.
NOYDIR - tells Yahoo! search not to use page titles and descriptions from the Yahoo! directory on the SERPs.
NOARCHIVE - Google specific1, used to prevent archiving
NOSNIPPET - Google specific, prevents Google from displaying text snippets for your page on its SERPs