Free Robots.txt Checker: Validate Directives

Q: Does the order of rules in robots.txt determine which one wins?

No, and this trips up a lot of people. Google's parser picks whichever matching rule has the longest (most specific) path, regardless of where it sits in the file. A Disallow: / at the very top can still lose to an Allow: /blog/ written below it, because the Allow rule is more specific. Assuming line order decides the outcome leads to robots.txt files that don't behave the way their author expects.

Q: What is a robots.txt file and why does it matter for SEO?

A plain-text file at the domain root that tells crawlers which pages or directories they can access. It's central to crawl budget management, and a misconfigured one can block Googlebot from important pages, effectively de-indexing them.

Q: Does blocking a page in robots.txt remove it from Google's index?

Not immediately, but eventually yes. Google can no longer re-crawl and confirm the page exists, so it drops out over time. Since Google can't see the content while it's blocked, existing ranking signals decay in the meantime.

🤖 Validate Your Robots.txt

Enter your domain to fetch and analyze its robots.txt file. Optionally test a specific URL path and user-agent to see if they would be blocked.

Domain URL

We'll automatically fetch /robots.txt from your domain root.

Test URL Path Optional

User-Agent Optional

An error occurred.

Free to use · No data stored · No account required

Fetching robots.txt…

Parsed Rules by User-Agent

Raw robots.txt

Want to be alerted if your robots.txt changes? TechySEO monitors your file 24/7 and sends an instant alert if it's modified, before an accidental block tanks your rankings.

Enable Change Alerts →

Understanding Your Results

Allowed

URL Is Allowed

The selected crawler can reach this URL, since the most specific matching rule permits it. This is the expected state for anything meant to show up in search results.

Blocked

URL Is Blocked

The most specific matching rule for this user-agent disallows it, even if a broader allow rule sits elsewhere in the file. If this is an important page, fix it before it disappears from the index.

Warning

Syntax Warning Detected

An invalid wildcard, missing colon, or trailing space. Malformed directives can be silently ignored, meaning the rule you think is active isn't doing anything.

Sitemap

Sitemap Declared

A Sitemap: line was found. Worth confirming the URL is publicly accessible and points to the right sitemap index for the site.

Fix Priority Order

Blocked high-value pageFind the most specific matching rule and narrow or remove it
Syntax error in directiveFix the malformed rule, it may be silently ignored by crawlers
Rule order assumed to matterSpecificity decides the winner, not which line comes first
Disallow on noindex pageSwitch to noindex only, the crawler must visit to read the tag

A File Most People Read Wrong, Even When They're Looking Right At It

Robots.txt is plain text, rarely more than a few kilobytes, and it acts as the command center for every crawler visiting a domain. The part that catches people off guard is how Google decides which rule actually applies: not by scanning top to bottom and stopping at the first match, but by picking whichever matching rule covers the longest, most specific path. A Disallow: / sitting at the very top of the file can still lose to an Allow: /blog/ written several lines below it, because specificity wins, not position.

That means two robots.txt files with the exact same rules in a different order behave identically, while two files that look nearly the same but differ in path specificity can behave completely differently. And none of this announces itself. Google doesn't de-index a page the moment a rule changes; it takes days or weeks, by which point nobody remembers the edit that caused it.

What Gets Checked Here

🔍

Syntax Integrity

Allow, Disallow, and Crawl-delay directives get checked for the small mistakes, a missing colon or stray space, that get a rule silently ignored.
🤖

Specificity-Based Rule Resolution

The same logic Google uses: whichever directive matches the longest portion of the path wins, regardless of where it sits in the file. This is what most manual reviews get wrong.
🗺️

Path Verification, Per User-Agent

Enter a path and a crawler, and get a definitive answer on whether it's blocked, useful before a launch or while chasing down a sudden ranking drop.
📍

Sitemap Declaration Detection

One of the two official places to declare an XML sitemap. Every declared URL gets surfaced so it can be checked for accessibility.

How to Fix robots.txt Issues

A misconfigured robots.txt can silently kill rankings for weeks before you notice. Here's how to resolve the most common issues, in order of severity.

Blocked high-value page: find the most specific rule, not just the first one

Don't stop at the first Disallow you spot. Check every rule matching the path and identify which one is actually the longest match, since that's the one Google obeys. Narrow or remove that specific rule, then re-test.

An Allow rule isn't working even though it's written below the Disallow

Position in the file doesn't matter; only specificity does. If Allow: /blog/ still isn't overriding Disallow: /, check the actual path lengths being compared, including trailing slashes and wildcards, since a shorter Allow path loses to a longer Disallow.

Syntax error in directive: rewrite the malformed rule

Missing colons, unsupported wildcard combinations, trailing spaces. Rewrite cleanly and re-run the checker, since an invalid directive gets silently ignored rather than flagged.

Disallow on a page with noindex: remove the Disallow, keep the meta tag

If Googlebot can't visit the page, it can never read the noindex tag either, so the page can stay indexed indefinitely. Drop the Disallow and let the crawler reach the page to read the noindex instruction itself.

Missing Sitemap declaration: add a Sitemap: directive

A Sitemap: https://yourdomain.com/sitemap.xml line, anywhere in the file, helps every crawler discover the full content inventory. Validate the URL with the Sitemap Validator first.

Two Traps, Same Root Cause: Assuming the File Reads Top to Bottom

Disallow and noindex get confused constantly, and the confusion has a specific shape: a page with Disallow can never have its noindex tag read, because Googlebot never visits it to find the tag in the first place. The page can sit in the index indefinitely with no control over how it appears.

The same misunderstanding shows up with rule precedence. Someone assumes that since their Disallow line comes first, it takes priority over an Allow added later. It doesn't. Google resolves the conflict by specificity, the longest matching path wins, full stop. Both traps come from treating robots.txt like a sequential script when it actually behaves more like a set of competing claims, where the most precise one wins regardless of where it's written.

Use noindex for pages that should stay crawlable but excluded from results. Reserve Disallow for paths that should never be accessed at all: internal APIs, session URLs, admin sections.

FAQ

Frequently Asked Questions

Does the order of rules in robots.txt determine which one wins?

No. Google picks whichever matching rule has the longest, most specific path, no matter where it sits in the file. A Disallow: / at the top can lose to an Allow: /blog/ several lines down. Treating the file as a sequential script that stops at the first match leads to rules that don't behave as expected.

What is a robots.txt file and why does it matter for SEO?

A plain-text file at the domain root telling crawlers which pages or directories they can access. It's central to crawl budget management, and a misconfigured one can block Googlebot from important pages, effectively de-indexing them over time.

Does blocking a page in robots.txt remove it from Google's index?

Not immediately, but eventually yes. Google can't re-crawl the page to confirm it still exists, so it drops out over time, and ranking signals decay in the meantime since Google can't see the content while it's blocked.

What's the difference between Disallow and noindex?

Disallow stops a crawler from visiting the page at all. Noindex tells the crawler not to include the page in the index, but the crawler still has to visit it to read that instruction. Using Disallow on a page you also want noindexed backfires, since the crawler never gets there.

Does robots.txt apply to subdomains?

No, each subdomain needs its own file. The robots.txt at example.com doesn't govern blog.example.com or shop.example.com, since each subdomain is treated as a separate host.

Can I block AI crawlers like GPTBot in robots.txt?

Yes, for any crawler that respects the standard. Add a User-agent line for GPTBot, ChatGPT-User, anthropic-ai, or PerplexityBot followed by Disallow: /. It only stops crawlers that voluntarily honor robots.txt, so pairing it with IP-level rules gives stronger enforcement against the rest.

A Test Catches Today's Problem.
Monitoring Catches the Next One.

Robots.txt files change during plugin installs and migrations, often silently, and on a growing platform a silent change here is an SEO emergency that just hasn't been noticed yet.

Robots.txt Change Alerts: get notified the moment it's modified, before an accidental block reaches Google.

Crawl Budget Management: see how robots.txt impacts actual crawl frequency using Search Console data.

Visual Architecture Mapping: see exactly which sections are hidden from bots in an interactive dashboard.

Conflict Resolution: automatically flag specificity conflicts a manual read would likely miss.

✓ 30-day Premium Trial · ✓ No credit card required · ✓ Full monitoring access

🔔

Robots.txt Change Monitoring

24/7 monitoring with instant alerts the moment your robots.txt file is modified, by a developer, a plugin, or a misconfigured deployment.

🗺️

Visual Architecture Map

See an interactive map of your site structure showing exactly which sections are open to crawlers and which are blocked, colored by bot type.

⚡

Crawl Budget Optimizer

Correlate your robots.txt directives with real Google crawl data from Search Console to identify wasted crawl budget and fix it fast.

Menu

Free Robots.txt Checker