• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
Daily Blog Tips

Daily Blog Tips

How to make money from your blog

  • Home
  • Popular
  • Contributors
  • About
  • Contact / Advertise
  • Blogging
  • Marketing
  • Design
  • Money
  • Reviews
  • Productivity
  • Software
  • Technology

Collection of Robots.txt Files

By Daniel 62 Comments Reading Time: 3 minutes

background image

The implementation of a suitable robots.txt file is very important for search engine optimization. There is plenty of advice around the Internet for the creation of such files (if you are looking for an introduction on this topic read “Creat a robots.txt file“), but what if instead of looking at what people say we could look at what people do?

That is what I did, collecting the robots.txt files from a wide range of blogs and websites. Below you will find them.

Key Takeaways

  • Only 2 out of 30 websites that I checked were not using a robots.txt file
  • Even if you don’t have any specific requirements for the search bots, therefore, you probably should use a simple robots.txt file
  • Most people stick to the “User-agent: *” attribute to cover all agents
  • The most common “Disallowed” factor is the RSS Feed
  • Google itself is using a combination of closed folders (e.g., /searchhistory/) and open ones (e.g., /search), which probably means they are treated differently
  • A minority of the sites included the sitemap URL on the robots.txt file

The Minimalistic Guys

Problogger.net

User-agent: *
Disallow:

Marketing Pilgrim

User-agent: *
Disallow:

Search Engine Journal

User-agent: *
Disallow:

Matt Cutts

User-agent: *
Allow:
User-agent: *
Disallow: /files/

Pronet Advertising

User-agent: *
Disallow: /mt
Disallow: /*.cgi$

TechCrunch

User-agent: *
Disallow: /*/feed/
Disallow: /*/trackback/

The Structured Ones

Online Marketing Blog

User-agent: Googlebot
Disallow: */feed/

User-agent: *
Disallow: /Blogger/
Disallow: /wp-admin/
Disallow: /stats/
Disallow: /cgi-bin/
Disallow: /2005x/

Shoemoney

User-Agent: Googlebot
Disallow: /link.php
Disallow: /gallery2
Disallow: /gallery2/
Disallow: /category/
Disallow: /page/
Disallow: /pages/
Disallow: /feed/
Disallow: /feed

Scoreboard Media

User-agent: *
Disallow: /cgi-bin/

User-agent: Googlebot
Disallow: /category/
Disallow: /page/
Disallow: */feed/
Disallow: /2007/
Disallow: /2006/
Disallow: /wp-*

SEOMoz.org

User-agent: *
Disallow: /blogdetail.php?ID=537
Disallow: /blog?page
Disallow: /blog/author/
Disallow: /blog/category/
Disallow: /tracker
Disallow: /ugc?page
Disallow: /ugc/author/
Disallow: /ugc/category/

Wolf-Howl

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /noindex/
Disallow: /privacy-policy/
Disallow: /about/
Disallow: /company-biographies/
Disallow: /press-media-room/
Disallow: /newsletter/
Disallow: /contact-us/
Disallow: /terms-of-service/
Disallow: /terms-of-service/
Disallow: /information/comment-policy/
Disallow: /faq/
Disallow: /contact-form/
Disallow: /advertising/
Disallow: /information/licensing-information/
Disallow: /2005/
Disallow: /2006/
Disallow: /2007/
Disallow: /2008/
Disallow: /2009/
Disallow: /2004/
Disallow: /*?*
Disallow: /page/
Disallow: /iframes/

John Chow

sitemap: http://www.johnchow.com/sitemap.xml

User-agent: *
Disallow: /cgi-bin/
Disallow: /go/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /author/
Disallow: /page/
Disallow: /category/
Disallow: /wp-images/
Disallow: /images/
Disallow: /backup/
Disallow: /banners/
Disallow: /archives/
Disallow: /trackback/
Disallow: /feed/

User-agent: Googlebot-Image
Allow: /wp-content/uploads/

User-agent: Mediapartners-Google
Allow: /

User-agent: duggmirror
Disallow: /

Smashing Magazine

Sitemap: http://www.smashingmagazine.com/sitemap.xml

User-agent: Mediapartners-Google*
Disallow:

User-agent: *
Disallow: /styles/
Disallow: /inc/
Disallow: /tag/
Disallow: /cc/
Disallow: /category/

User-agent: MSIECrawler
Disallow: /

User-agent: psbot
Disallow: /

User-agent: Fasterfox
Disallow: /

User-agent: Slurp
Crawl-delay: 200

Gizmodo

User-Agent: Googlebot
Disallow: /index.xml$
Disallow: /excerpts.xml$
Allow: /sitemap.xml$
Disallow: /*view=rss$
Disallow: /*?view=rss$
Disallow: /*format=rss$
Disallow: /*?format=rss$
Sitemap: http://gizmodo.com/sitemap.xml

Lifehacker

User-Agent: Googlebot
Disallow: /index.xml$
Disallow: /excerpts.xml$
Allow: /sitemap.xml$
Disallow: /*view=rss$
Disallow: /*?view=rss$
Disallow: /*format=rss$
Disallow: /*?format=rss$
Sitemap: http://lifehacker.com/sitemap.xml

The Mainstream Media

Wall Street Journal

User-agent: *
Disallow: /article_email/
Disallow: /article_print/
Disallow: /PA2VJBNA4R/
Sitemap: http://online.wsj.com/sitemap.xml

ZDNet

User-agent: *
Disallow: /Ads/
Disallow: /redir/
# Disallow: /i/ is removed per 202323
Disallow: /av/
Disallow: /css/
Disallow: /error/
Disallow: /clear/
Disallow: /mac-ad
Disallow: /adlog/
# URS per bug 239819, these were expanded
Disallow: /1300-
Disallow: /1301-
Disallow: /1302-
Disallow: /1303-
Disallow: /1304-
Disallow: /1305-
Disallow: /1306-
Disallow: /1307-
Disallow: /1308-
Disallow: /1309-
Disallow: /1310-
Disallow: /1311-
Disallow: /1312-
Disallow: /1313-
Disallow: /1314-
Disallow: /1315-
Disallow: /1316-
Disallow: /1317-

NY Times

# robots.txt, www.nytimes.com 6/29/2006
#
User-agent: *
Disallow: /pages/college/
Disallow: /college/
Disallow: /library/
Disallow: /learning/
Disallow: /aponline/
Disallow: /reuters/
Disallow: /cnet/
Disallow: /partners/
Disallow: /archives/
Disallow: /indexes/
Disallow: /thestreet/
Disallow: /nytimes-partners/
Disallow: /financialtimes/
Allow: /pages/
Allow: /2003/
Allow: /2004/
Allow: /2005/
Allow: /top/
Allow: /ref/
Allow: /services/xml/

User-agent: Mediapartners-Google*
Disallow:

YouTube

# robots.txt file for YouTube

User-agent: Mediapartners-Google*
Disallow:

User-agent: *
Disallow: /profile
Disallow: /results
Disallow: /browse
Disallow: /t/terms
Disallow: /t/privacy
Disallow: /login
Disallow: /watch_ajax
Disallow: /watch_queue_ajax

Bonus

Google

User-agent: *
Allow: /searchhistory/
Disallow: /news?output=xhtml&
Allow: /news?output=xhtml
Disallow: /search
Disallow: /groups
Disallow: /images
Disallow: /catalogs
Disallow: /catalogues
Disallow: /news
Disallow: /nwshp
Disallow: /?
Disallow: /addurl/image?
Disallow: /pagead/
Disallow: /relpage/
Disallow: /relcontent
Disallow: /sorry/
Disallow: /imgres
Disallow: /keyword/
Disallow: /u/
Disallow: /univ/
Disallow: /cobrand
Disallow: /custom
Disallow: /advanced_group_search
Disallow: /advanced_search
Disallow: /googlesite
Disallow: /preferences
Disallow: /setprefs
Disallow: /swr
Disallow: /url
Disallow: /default
Disallow: /m?
Disallow: /m/search?
Disallow: /wml?
Disallow: /wml/search?
Disallow: /xhtml?
Disallow: /xhtml/search?
Disallow: /xml?
Disallow: /imode?
Disallow: /imode/search?
Disallow: /jsky?
Disallow: /jsky/search?
Disallow: /pda?
Disallow: /pda/search?

Collection of Robots.txt Files Photo

About Daniel

Daniel Scocco is a programmer and entrepreneur located in São Paulo, Brazil. His first company, Online Profits, builds and manages websites in different niches. His second company, Kubic, specializes in developing mobile apps for the iOS and Android platforms.

Reader Interactions

Comments

  1. Avatar of Dan from Lawn Mower ReviewsDan from Lawn Mower Reviews says

    at

    Hi,
    I use the All in one SEO plugin on most of my sites. Will this automatically insert a robot.txt file for me? I have heard about this before, but to be honest I kinda overlooked it and didn’t think it was too important, however now I am a little worried that I am missing a vital trick here.

    I would guess that wordpress automatically create the robot.txt file for you, but just double checking?

    Thanks

    Reply
  2. Avatar of shabishabi says

    at

    hi…nice to see this…but can anyone tell me…whatis the function of these lines in robots.txt…
    “User-Agent: Googlebot
    Disallow: /index.xml$
    Disallow: /excerpts.xml$”
    i have seen these lines used gizmodo and other bloggers..
    should these 2 lines help in removing duplicate content??
    pls i want the answer..
    thanks

    Reply
  3. Avatar of Free Web DirectoryFree Web Directory says

    at

    nice stuff, you make ideas for me to change my robots.txt

    Reply
  4. Avatar of ElleElle says

    at

    Should the /wp-content folder be included as well…where all the themes and plugins reside? I’ve not noticed it listed on any robot.txt file anywhere..

    Thanks.

    Reply
  5. Avatar of Bang KritikusBang Kritikus says

    at

    But, my blogspot’s robot.txt is not editable..

    Reply
  6. Avatar of SEO FreelancerSEO Freelancer says

    at

    Nice collection – this will help me and new webmaster and web designer to create a robots.txt file as they want

    Reply
  7. Avatar of JohnJohn says

    at

    Hello…!

    Can anyone tell me the list of websites which archives the websites. Pandora, Internet archive’s Waybackmachine are the some of the examples, I want to know the entire web archiving websites, please…..

    Reply
  8. Avatar of SangeshSangesh says

    at

    I got to know more about the “robots.txt” in this article.

    Thanks.

    Reply
  9. Avatar of 东莞网站建设东莞网站建设 says

    at

    Very good learning

    Reply
  10. Avatar of AskApacheAskApache says

    at

    The real benefit to learning about the robots.txt file and how it works is it teaches you to think like the web crawlers. Especially when you start targeting different user-agents/bots…

    webmasterworld is definately the coolest, and 2nd is of course askapache.com

    Reply
  11. Avatar of Visitor413Visitor413 says

    at

    Your site found in Google:

    Reply
  12. Avatar of Visitor367Visitor367 says

    at

    I have visited your site 623-times

    Reply
  13. Avatar of DanielDaniel says

    at

    The effect upon individual ranking of your pages should not be huge, so do not expect to go from the tenth page to the first page of Google just because of using a robots.txt file.

    That said, your search engine traffic will probably increase a lot if many of your pages were in the supplemental hell. First and foremost because now you will be cover many more keywords and terms.

    Reply
  14. Avatar of TechZiloTechZilo says

    at

    I’d like to echo Zath’s question, since my number of indexed pages has gone down too..will it affect SERPs?

    Reply
  15. Avatar of vijayvijay says

    at

    Hmm. I haven’t thought yet to update my robot.txt
    Its as simple as problogger.net no complications 😉
    I actually avoided that part coz I am not that much aware of robot.txt file changes and its effects!
    Soon will give some time for that.
    Thanks for the advice anyway.

    Reply
  16. Avatar of vijayvijay says

    at

    Hmm. I have not thinked yest to update my robot.txt
    Its as simple as problogger.net no complications 😉
    I actually avoided that part coz I am not that much aware of robot.txt file changes and its effects!
    Soon will give some time for that.
    Thanks for the advice anyway.

    Reply
  17. Avatar of ZathZath says

    at

    I recently set up a robots.txt file and have noticed that my supplemental links on Google have gone down from around 2023 pages to about 250.

    I’m thinking that’s pretty good, but like others have said, I’m not quite sure how much of a difference it makes to my site rankings.

    Will this give more search engine traffic going forward or increase the chances of a better Pagerank?

    Reply
  18. Avatar of Matt WardmanMatt Wardman says

    at

    >Wow, I’m surprised that so many SEO experts don’t include a line for sitemap autodiscovery. It’s not like it’s difficult to implement or anything

    If you have a Google sitemap plugin for WordPress it pingfs google every time you post anyway.

    And:

    The robots.txt for webmasterworld.com has a blog in it. Fun.

    Reply
  19. Avatar of CypherHackzCypherHackz says

    at

    i have this, list of robots.txt links. you can see here: Big Websites with Big Robots

    Reply
  20. Avatar of PcherePchere says

    at

    I am also tweaking my robots.txt to remove duplicate content in WordPress. It was very insightful to see how top sites are dealing with the issue.

    Reply
  21. Avatar of Jordan McCollumJordan McCollum says

    at

    What timing! I was just contemplating roboting out my category and archive pages. Thanks for this!

    Reply
  22. Avatar of PatrixPatrix says

    at

    I have been tweaking my robots.txt file for quite some time now mostly to reduce duplicate content (get pages out of supplemental hell)but haven’t noticed any appreciable difference.

    I have been checking a few A-bloggers blogs for their robots.txt files so thanks for doing this.

    BTW why does Shoemoney disallow some directories with and without the forward slash? What is the difference?

    Reply
  23. Avatar of DanielDaniel says

    at

    Adnan, I will need to tweak mine as well. So far I am getting pretty good results with a minimalistic one though, just exclusing feeds, trackbacks and WP files.

    Reply
  24. Avatar of AdnanAdnan says

    at

    Hey Daniel – thanks for that compilation – its very interesting to see how some SEO sites like SearchEngineJournal were minimal, but how SEOMoz has something different.
    Now I need to decide on which one to choose 😉

    Reply
  25. Avatar of Hugh | A Politically Incorrect EntrepreneurHugh | A Politically Incorrect Entrepreneur says

    at

    While crawling around the interweb a few days ago, I found the robots.txt file for the whitehouse (whitehouse.gov/robots.txt)

    I just thought it interesting the things they disallowed.

    Reply
  26. Avatar of John WesleyJohn Wesley says

    at

    Very interesting post. I actually started using something similar to Chow’s after he published it on his blog last week. It seems to be adding a bit of Google traffic.

    Reply
  27. Avatar of DanielDaniel says

    at

    Nia, sorry for that I just updated the article with a link to an introductory post I wrote sometime ago:

    Reply
  28. Avatar of NiaNia says

    at

    This looks valuable except I don’t know how to use it yet. I’ve put it in my RSS shares and when I figure it out I’ll implement the lesson and post about it. Thanks. 😉

    Reply
  29. Avatar of PabloPablo says

    at

    nice stuff, i already changed my robots.txt

    Reply
  30. Avatar of DanielDaniel says

    at

    Stephen, I don’t think the “autodiscovery” factor is related to how easy it is to implement.

    The question is: will it bring tangible improvements?

    Reply
  31. Avatar of StephenStephen says

    at

    Wow, I’m surprised that so many SEO experts don’t include a line for sitemap autodiscovery. It’s not like it’s difficult to implement or anything…

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Primary Sidebar

Trending Today

Popular

  • 28 Ways to Make Money with Your Website 514 Comments
  • 43 Web Design Mistakes You Should Avoid 474 Comments
  • 4 Steps to Increase Your Blog Traffic 188 Comments
  • How to Find Advertisers for Your Website 126 Comments
  • Top 25 SEO Blogs 243 Comments
  • 101 Blog Tips I learned 177 Comments
  • 30 Traffic Generation Tips 351 Comments
  • 6 Ways to Speed Up Your Site 51 Comments
  • 9 Hacks to Show Readers Your Best Stuff 21 Comments
  • 50 Simple Ways to Gain RSS Subscribers 145 Comments
  • The Best Website Taglines 127 Comments
  • What Is Success? 57 Comments
  • How to setup a 301 Redirect 52 Comments
  • The 7 Characteristics of Good Domains 76 Comments
  • 7 Ways to Promote your Site 133 Comments
  • Top 25 Celebrity Blogs 44 Comments
  • What Is A Blog? 81 Comments
  • Blog Setup: 40 Practical Tips 57 Comments
  • 10 Tips To Write Your Most Popular Post 79 Comments
  • The Blog Post Checklist 75 Comments
  • Interview: 12 Top Online Entrepreneurs 98 Comments
  • What Is Bounce Rate? 42 Comments
  • 20 SEO Terms You Should Know 28 Comments
  • How To Choose A Blog Name 15 Comments
  • 10 Tips to Sell Your Website on Flippa 37 Comments
  • Top 25 Web Design Blogs 80 Comments

Online business done right. Take your marketing efforts to the next level with tips and resources to get visitors to your website.
As featured
CNBC copyblogger problogger Entrepreneur Lifehacker Hubspot Business Insider Wordpress Showcase
Privacy Policy| Terms of Service | About | Contact

777 Brickell Ave #500-14648, Miami, FL, 33131, US
DBT Logo
DBT is an independent website. The views expressed on this site may come from individual contributors and do not necessarily reflect the view of DBT or any other organization. All Content Copyright ©2006-2023. Daily Blog Tips unless otherwise noted or credited.