Sam Traveler - The implementation of a suitable robots.txt file is very important for search engine optimization. There is plenty of advice around the Internet for the creation of such files (if you are looking for an introduction on this topic read “Creat a robots.txt file“), but what if instead of looking at what people say we could look at what people do?
That is what I did, collecting the robots.txt files from a wide range of blogs and websites. Below you will find them.
Key Takeaways
- Only 2 out of 30 websites that I checked were not using a robots.txt file
- Even if you don’t have any specific requirements for the search bots, therefore, you probably should use a simple robots.txt file
- Most people stick to the “User-agent: *” attribute to cover all agents
- The most common “Disallowed” factor is the RSS Feed
- Google itself is using a combination of closed folders (e.g., /searchhistory/) and open ones (e.g., /search), which probably means they are treated differently
- A minority of the sites included the sitemap URL on the robots.txt file
The Minimalistic Guys
Problogger.net
User-agent: *Disallow:
Marketing Pilgrim
User-agent: *Disallow:
Search Engine Journal
User-agent: *Disallow:
Matt Cutts
User-agent: *Allow:User-agent: *Disallow: /files/
Pronet Advertising
User-agent: *Disallow: /mtDisallow: /*.cgi$
TechCrunch
User-agent: *Disallow: /*/feed/Disallow: /*/trackback/
The Structured Ones
Online Marketing Blog
User-agent: GooglebotDisallow: */feed/
User-agent: *Disallow: /Blogger/Disallow: /wp-admin/Disallow: /stats/Disallow: /cgi-bin/Disallow: /2005x/
Shoemoney
User-Agent: GooglebotDisallow: /link.phpDisallow: /gallery2Disallow: /gallery2/Disallow: /category/Disallow: /page/Disallow: /pages/Disallow: /feed/Disallow: /feed
Scoreboard Media
User-agent: *Disallow: /cgi-bin/
User-agent: GooglebotDisallow: /category/Disallow: /page/Disallow: */feed/Disallow: /2007/Disallow: /2006/Disallow: /wp-*
SEOMoz.org
User-agent: *Disallow: /blogdetail.php?ID=537Disallow: /blog?pageDisallow: /blog/author/Disallow: /blog/category/Disallow: /trackerDisallow: /ugc?pageDisallow: /ugc/author/Disallow: /ugc/category/
Wolf-Howl
User-agent: *Disallow: /cgi-bin/Disallow: /images/Disallow: /noindex/Disallow: /privacy-policy/Disallow: /about/Disallow: /company-biographies/Disallow: /press-media-room/Disallow: /newsletter/Disallow: /contact-us/Disallow: /terms-of-service/Disallow: /terms-of-service/Disallow: /information/comment-policy/Disallow: /faq/Disallow: /contact-form/Disallow: /advertising/Disallow: /information/licensing-information/Disallow: /2005/Disallow: /2006/Disallow: /2007/Disallow: /2008/Disallow: /2009/Disallow: /2004/Disallow: /*?*Disallow: /page/Disallow: /iframes/
John Chow
sitemap: http://www.johnchow.com/sitemap.xmlUser-agent: *Disallow: /cgi-bin/Disallow: /go/Disallow: /wp-admin/Disallow: /wp-includes/Disallow: /author/Disallow: /page/Disallow: /category/Disallow: /wp-images/Disallow: /images/Disallow: /backup/Disallow: /banners/Disallow: /archives/Disallow: /trackback/Disallow: /feed/
User-agent: Googlebot-ImageAllow: /wp-content/uploads/
User-agent: Mediapartners-GoogleAllow: /
User-agent: duggmirrorDisallow: /
Smashing Magazine
Sitemap: http://www.smashingmagazine.com/sitemap.xmlUser-agent: Mediapartners-Google*Disallow:
User-agent: *Disallow: /styles/Disallow: /inc/Disallow: /tag/Disallow: /cc/Disallow: /category/
User-agent: MSIECrawlerDisallow: /
User-agent: psbotDisallow: /
User-agent: FasterfoxDisallow: /
User-agent: SlurpCrawl-delay: 200
Gizmodo
User-Agent: GooglebotDisallow: /index.xml$Disallow: /excerpts.xml$Allow: /sitemap.xml$Disallow: /*view=rss$Disallow: /*?view=rss$Disallow: /*format=rss$Disallow: /*?format=rss$Sitemap: http://gizmodo.com/sitemap.xml
Lifehacker
User-Agent: GooglebotDisallow: /index.xml$Disallow: /excerpts.xml$Allow: /sitemap.xml$Disallow: /*view=rss$Disallow: /*?view=rss$Disallow: /*format=rss$Disallow: /*?format=rss$Sitemap: http://lifehacker.com/sitemap.xml
The Mainstream Media
Wall Street Journal
User-agent: *Disallow: /article_email/Disallow: /article_print/Disallow: /PA2VJBNA4R/Sitemap: http://online.wsj.com/sitemap.xml
ZDNet
User-agent: *Disallow: /Ads/Disallow: /redir/# Disallow: /i/ is removed per 190723Disallow: /av/Disallow: /css/Disallow: /error/Disallow: /clear/Disallow: /mac-adDisallow: /adlog/# URS per bug 239819, these were expandedDisallow: /1300-Disallow: /1301-Disallow: /1302-Disallow: /1303-Disallow: /1304-Disallow: /1305-Disallow: /1306-Disallow: /1307-Disallow: /1308-Disallow: /1309-Disallow: /1310-Disallow: /1311-Disallow: /1312-Disallow: /1313-Disallow: /1314-Disallow: /1315-Disallow: /1316-Disallow: /1317-
NY Times
# robots.txt, www.nytimes.com 6/29/2006#User-agent: *Disallow: /pages/college/Disallow: /college/Disallow: /library/Disallow: /learning/Disallow: /aponline/Disallow: /reuters/Disallow: /cnet/Disallow: /partners/Disallow: /archives/Disallow: /indexes/Disallow: /thestreet/Disallow: /nytimes-partners/Disallow: /financialtimes/Allow: /pages/Allow: /2003/Allow: /2004/Allow: /2005/Allow: /top/Allow: /ref/Allow: /services/xml/
User-agent: Mediapartners-Google*Disallow:
YouTube
# robots.txt file for YouTubeUser-agent: Mediapartners-Google*Disallow:
User-agent: *Disallow: /profileDisallow: /resultsDisallow: /browseDisallow: /t/termsDisallow: /t/privacyDisallow: /loginDisallow: /watch_ajaxDisallow: /watch_queue_ajax
Bonus
Google
User-agent: *Allow: /searchhistory/Disallow: /news?output=xhtml&Allow: /news?output=xhtmlDisallow: /searchDisallow: /groupsDisallow: /imagesDisallow: /catalogsDisallow: /cataloguesDisallow: /newsDisallow: /nwshpDisallow: /?Disallow: /addurl/image?Disallow: /pagead/Disallow: /relpage/Disallow: /relcontentDisallow: /sorry/Disallow: /imgresDisallow: /keyword/Disallow: /u/Disallow: /univ/Disallow: /cobrandDisallow: /customDisallow: /advanced_group_searchDisallow: /advanced_searchDisallow: /googlesiteDisallow: /preferencesDisallow: /setprefsDisallow: /swrDisallow: /urlDisallow: /defaultDisallow: /m?Disallow: /m/search?Disallow: /wml?Disallow: /wml/search?Disallow: /xhtml?Disallow: /xhtml/search?Disallow: /xml?Disallow: /imode?Disallow: /imode/search?Disallow: /jsky?Disallow: /jsky/search?Disallow: /pda?Disallow: /pda/search?
Tag :
DailyBlogTips
0 Komentar untuk "Collection of Robots.txt Files"