Menu
🆓 Free SEO Tool, No Account Required

Free Robots.txt Checker

Google doesn't read robots.txt top to bottom and stop at the first match. It picks whichever rule is most specific, so a broad block written first can still lose to a narrower allow written underneath it. Validate your directives and test exactly how a given crawler reads them, in seconds.

🤖 Validate Your Robots.txt
Enter your domain to fetch and analyze its robots.txt file. Optionally test a specific URL path and user-agent to see if they would be blocked.
We'll automatically fetch /robots.txt from your domain root.

Free to use · No data stored · No account required

Fetching robots.txt…

A File Most People Read Wrong, Even When They're Looking Right At It

Robots.txt is plain text, rarely more than a few kilobytes, and it acts as the command center for every crawler visiting a domain. The part that catches people off guard is how Google decides which rule actually applies: not by scanning top to bottom and stopping at the first match, but by picking whichever matching rule covers the longest, most specific path. A Disallow: / sitting at the very top of the file can still lose to an Allow: /blog/ written several lines below it, because specificity wins, not position.

That means two robots.txt files with the exact same rules in a different order behave identically, while two files that look nearly the same but differ in path specificity can behave completely differently. And none of this announces itself. Google doesn't de-index a page the moment a rule changes; it takes days or weeks, by which point nobody remembers the edit that caused it.

What Gets Checked Here


How to Fix robots.txt Issues

A misconfigured robots.txt can silently kill rankings for weeks before you notice. Here's how to resolve the most common issues, in order of severity.

1
Blocked high-value page: find the most specific rule, not just the first one

Don't stop at the first Disallow you spot. Check every rule matching the path and identify which one is actually the longest match, since that's the one Google obeys. Narrow or remove that specific rule, then re-test.

2
An Allow rule isn't working even though it's written below the Disallow

Position in the file doesn't matter; only specificity does. If Allow: /blog/ still isn't overriding Disallow: /, check the actual path lengths being compared, including trailing slashes and wildcards, since a shorter Allow path loses to a longer Disallow.

3
Syntax error in directive: rewrite the malformed rule

Missing colons, unsupported wildcard combinations, trailing spaces. Rewrite cleanly and re-run the checker, since an invalid directive gets silently ignored rather than flagged.

4
Disallow on a page with noindex: remove the Disallow, keep the meta tag

If Googlebot can't visit the page, it can never read the noindex tag either, so the page can stay indexed indefinitely. Drop the Disallow and let the crawler reach the page to read the noindex instruction itself.

5
Missing Sitemap declaration: add a Sitemap: directive

A Sitemap: https://yourdomain.com/sitemap.xml line, anywhere in the file, helps every crawler discover the full content inventory. Validate the URL with the Sitemap Validator first.


Two Traps, Same Root Cause: Assuming the File Reads Top to Bottom

Disallow and noindex get confused constantly, and the confusion has a specific shape: a page with Disallow can never have its noindex tag read, because Googlebot never visits it to find the tag in the first place. The page can sit in the index indefinitely with no control over how it appears.

The same misunderstanding shows up with rule precedence. Someone assumes that since their Disallow line comes first, it takes priority over an Allow added later. It doesn't. Google resolves the conflict by specificity, the longest matching path wins, full stop. Both traps come from treating robots.txt like a sequential script when it actually behaves more like a set of competing claims, where the most precise one wins regardless of where it's written.

Use noindex for pages that should stay crawlable but excluded from results. Reserve Disallow for paths that should never be accessed at all: internal APIs, session URLs, admin sections.

Frequently Asked Questions

No. Google picks whichever matching rule has the longest, most specific path, no matter where it sits in the file. A Disallow: / at the top can lose to an Allow: /blog/ several lines down. Treating the file as a sequential script that stops at the first match leads to rules that don't behave as expected.
A plain-text file at the domain root telling crawlers which pages or directories they can access. It's central to crawl budget management, and a misconfigured one can block Googlebot from important pages, effectively de-indexing them over time.
Not immediately, but eventually yes. Google can't re-crawl the page to confirm it still exists, so it drops out over time, and ranking signals decay in the meantime since Google can't see the content while it's blocked.
Disallow stops a crawler from visiting the page at all. Noindex tells the crawler not to include the page in the index, but the crawler still has to visit it to read that instruction. Using Disallow on a page you also want noindexed backfires, since the crawler never gets there.
No, each subdomain needs its own file. The robots.txt at example.com doesn't govern blog.example.com or shop.example.com, since each subdomain is treated as a separate host.
Yes, for any crawler that respects the standard. Add a User-agent line for GPTBot, ChatGPT-User, anthropic-ai, or PerplexityBot followed by Disallow: /. It only stops crawlers that voluntarily honor robots.txt, so pairing it with IP-level rules gives stronger enforcement against the rest.

A Test Catches Today's Problem.
Monitoring Catches the Next One.

Robots.txt files change during plugin installs and migrations, often silently, and on a growing platform a silent change here is an SEO emergency that just hasn't been noticed yet.

Robots.txt Change Alerts: get notified the moment it's modified, before an accidental block reaches Google.
Crawl Budget Management: see how robots.txt impacts actual crawl frequency using Search Console data.
Visual Architecture Mapping: see exactly which sections are hidden from bots in an interactive dashboard.
Conflict Resolution: automatically flag specificity conflicts a manual read would likely miss.

✓ 30-day Premium Trial  ·  ✓ No credit card required  ·  ✓ Full monitoring access

🔔
Robots.txt Change Monitoring
24/7 monitoring with instant alerts the moment your robots.txt file is modified, by a developer, a plugin, or a misconfigured deployment.
🗺️
Visual Architecture Map
See an interactive map of your site structure showing exactly which sections are open to crawlers and which are blocked, colored by bot type.
Crawl Budget Optimizer
Correlate your robots.txt directives with real Google crawl data from Search Console to identify wasted crawl budget and fix it fast.