What is a Robots.txt File?
Robots.txt is a text file webmasters create to instruct web robots (typically search engine robots) how to crawl pages on their website. Robots.txt files indicate whether certain user agents (web-crawling software) can or cannot crawl parts of a website. These crawl instructions are specified by “disallowing” or “allowing” the behavior of certain (or all) user agents.
User-agent: [user-agent name]
Disallow: [URL string not to be crawled]
You can use Google’s Robots.txt Tester tool to show you whether your robots.txt file blocks Google web crawlers from specific URLs on your site.
You can create a directive (or command) to prevent bots from crawling specific pages. After the disallow, enter the part of the URL that comes after the .com. Put that between two forward slashes. So if you want to tell a bot to not crawl your page http://yoursite.com/page/, you can add this to your robots.txt file:
However, you could disallow a page, but it could still end up in the index. So, you would also need the noindex directive. It works with the disallow directive to make sure bots don’t visit or index certain pages. If you have any pages that you don’t want indexed (like your thank you pages), you can use both disallow and noindex directive like the below:
Now, that page won’t show up in the search engine results pages.
I hope this trick is helpful!
La Shawn, Small Business Warrior