Agents are already visiting your site, and most leave empty-handed. They cannot execute JavaScript, struggle with complex navigation, and waste tokens parsing HTML boilerplate. llms.txt fixes this by placing a clean, structured document at your site root that contains exactly what they need.
Why It Exists
When an agent visits a typical website, it downloads kilobytes of HTML, strips out nav bars, footers, and script tags, then guesses which text actually matters. That process is slow, token-expensive, and error-prone. In 2024, Jeremy Howard and fast.ai proposed a purpose-built alternative[1]: a Markdown-formatted plain text file at /llms.txt that hands agents the information directly.
The simplest way to think about it is robots.txt in reverse. Where robots.txt restricts access, llms.txt actively provides content.
How It Works
An agent makes a GET request to yourdomain.com/llms.txt. The file comes back as plain text, with no JavaScript rendering, HTML parsing, or cookie banners to navigate. A companion file, /llms-full.txt, can optionally contain the full text of your most important pages for agents that want deeper context without crawling.
What to Include
The best llms.txt files are concise and factual. Focus on:
- A company overview covering what you do and who you serve
- Product and service summaries an agent can quote directly
- Links to key pages with one-line descriptions
- Contact or support information
Skip marketing fluff. Agents do not respond to adjectives.
Current Adoption
Anthropic[2], Perplexity[3], Cloudflare[4], and hundreds of other sites now serve llms.txt files. The standard is not yet an IETF RFC, but adoption is growing fast enough that waiting means falling behind competitors who already have one.
How Site Scanner Helps
Site Scanner checks for the presence of /llms.txt as part of its Discoverability audit. If you do not have one, it flags the gap and tells you exactly what to add.