What llms.txt actually is
llms.txt is to AI crawlers what robots.txt is to classical search crawlers: a root-level file that gives the visiting bot a structured hint about how to interpret the site. The format is a Markdown document with:
- An H1 with the site name.
- A blockquote with a short site description.
- Optional H2 sections grouping URLs by category (Docs, Products, Blog, etc.).
- Each URL listed as a Markdown link with a brief annotation.
- Optional "Optional" section for URLs the crawler can skip.
Why Markdown?
Because LLMs already read Markdown natively. The format is human-readable and machine-parseable without a separate schema. It's the same logic as why Schema.org JSON-LD is the dominant structured-data format — pick a notation the consumer already understands.
Adoption is partial — but growing
Let's be honest about the state of adoption. As of mid-2026:
- No major AI crawler (OpenAI's GPTBot, Anthropic's ClaudeBot, Perplexity's PerplexityBot, Google-Extended) requires llms.txt.
- Several read it opportunistically when present and use it as a prioritization signal.
- The standard is on its way to wider adoption but hasn't reached the universality of robots.txt.
- The cost of shipping one is approximately zero — a static text file at the root.
- The downside risk is also zero — crawlers that don't understand it simply ignore it.
A minimal llms.txt
A working llms.txt for a small business site looks like this. Keep it short — crawlers don't want a sitemap dump, they want a curated index.
- Start with the H1 (site name) and a single-sentence blockquote describing what the business does.
- Group URLs by category (About, Services, Blog, Case Studies, Contact).
- Annotate each URL with one sentence describing what's on the page.
- Skip thin pages — privacy policy, terms, generic landing pages.
- Update when you add cornerstone content. No more than monthly maintenance.
Mistakes to avoid
A few things llms.txt is not:
- It is not a substitute for clean, citation-worthy HTML on the pages themselves. A llms.txt pointing at a JS-only SPA still won't get cited.
- It is not a replacement for Schema.org JSON-LD. The two work together: llms.txt directs the crawler; JSON-LD tells it what each page is.
- It is not enforced — no crawler treats it as gospel. It's a hint, not a directive.
- It is not a place to hide. Anything you list in llms.txt should also be linked from your normal nav and sitemap.
The case for doing it now
The marginal cost of shipping llms.txt is one to three hours of work. The marginal upside is meaningful if even one major crawler upgrades it from "opportunistic read" to "required input" in the next 12 months — which several signals suggest is plausible.
We also use llms.txt as a discipline forcing function. The act of curating the 20-40 most citation-worthy URLs on a site surfaces gaps. If you can't write a single useful sentence about a URL, that URL probably needs work.
