robots.txt

The robots.txt file is a plain text file placed at the root of your website that instructs search engine crawlers (bots) on which pages or sections they are allowed to crawl and index. It is one of the oldest and most fundamental technical SEO tools.

For AEC firms, robots.txt is typically used to prevent crawlers from indexing staging environments, admin areas, duplicate content, or internal search result pages. Incorrectly configured robots.txt files are a surprisingly common cause of websites being entirely de-indexed from search engines.

A well-configured robots.txt should allow the major search engine bots (Googlebot, Bingbot) to access all public-facing content, and include a reference to your sitemap.xml location. Example: "Sitemap: https://yourfirm.com/sitemap.xml".

Importantly, robots.txt only controls crawling, not indexing. Pages blocked by robots.txt can still appear in search results if other websites link to them. To prevent indexing entirely, use a "noindex" meta tag in the page's HTML head instead.

Related terms