Question 1

What is a robots.txt file?

Accepted Answer

A robots.txt file tells search engine crawlers which pages or sections of your site they can or cannot request. It is placed at the root of your website (e.g., https://example.com/robots.txt) and follows the Robots Exclusion Protocol.

Question 2

Does robots.txt guarantee pages won't be indexed?

Accepted Answer

No. robots.txt tells well-behaved crawlers not to access certain pages, but it does not prevent indexing. If a page is linked from elsewhere, search engines may still index its URL. For true de-indexing, use a noindex meta tag or X-Robots-Tag header.

Question 3

What is the Crawl-delay directive?

Accepted Answer

Crawl-delay tells crawlers how many seconds to wait between successive requests. Google ignores this directive (use Google Search Console instead), but Bing, Yandex, and others respect it. A value of 10 means the crawler waits 10 seconds between requests.

Question 4

Should I include a sitemap in robots.txt?

Accepted Answer

Yes, it is best practice to include a Sitemap directive in your robots.txt file. This helps search engines discover and crawl all the important pages on your site more efficiently.

Question 5

Can I block AI crawlers with robots.txt?

Accepted Answer

Yes. Many AI companies respect robots.txt directives. You can block crawlers like GPTBot (OpenAI), Google-Extended (Gemini), CCBot (Common Crawl), and others by adding specific User-agent rules with Disallow: /.

robots.txt Generator

Frequently Asked Questions

What is a robots.txt file?

Does robots.txt guarantee pages won't be indexed?

What is the Crawl-delay directive?

Should I include a sitemap in robots.txt?

Can I block AI crawlers with robots.txt?