Paste your markdown configuration file below to perform real-time structural checks, link auditing, and token budget analysis.
Input your Markdown script below. We will instantly grade its parsing density and flag any potential syntax errors.
Large Language Models (LLMs) and autonomous AI crawler agents (like GPTBot, ClaudeBot, and Gemini-Extended) read text directories to build query response summaries. Traditional web pages containing design assets, CSS declarations, and javascript handlers consume valuable tokens and introduce structural noise. Creating a clean, compliant llms.txt file ensures AI bots can index your documentation efficiently, preserving crawler bandwidth and yielding better Generative Engine Optimization (GEO) citation results.
Because the AI parser ecosystems are heavily standardized, any deviation in markdown formatting can result in catastrophic parsing failures. A broken link or an unescaped HTML tag might cause a model to abort the ingestion process entirely. This is why strict syntax auditing via our Validator is mandatory before deployment.
The core standard we validate against stems from the initial Answer.ai specification, which defined how AI models prefer to ingest structured data. The llms.txt standard is intentionally rigid.
When you paste your markdown into our validator, we simulate the tokenization chunking process that an LLM would execute. If your file is a monolithic block of 50,000 words without H2 (##) subheadings, a RAG (Retrieval-Augmented Generation) system cannot easily split the document into vector embeddings. The validator checks for proper semantic chunking markers.
The validated llms.txt file is fetched.
Parser splits text at H2 (##) boundaries.
Chunks are converted to numerical vectors for RAG retrieval.
Our validator tests your markdown parameters against the official specification. Here is a comprehensive breakdown of the rules evaluated by our scanner:
| Audit Rule | Compliance Standard | Severity |
|---|---|---|
| H1 Title Header | The file must begin with a single Level 1 heading (# Project Name) declaring the primary website/project name. |
CRITICAL |
| Summary Blockquote | A brief, single-sentence project pitch starting with > must immediately follow the H1 title block. |
WARNING |
| Absolute Hyperlinks | All resources must link to absolute URLs (e.g. https://yoursite.com/page). Relative paths (./page) fail parsing across domains. |
CRITICAL |
| H2 Subsections | Use level 2 headings (## Category) to organize your hyperlinks into structured tables-of-contents for chunking. |
WARNING |
| HTML Injection | The file must not contain raw HTML tags (<div>, <br>). It bloats tokens and confuses basic markdown parsers. |
CRITICAL |
| llms-full.txt Link | Should ideally declare a link pointing to the full-text documentation database at /llms-full.txt. |
ADVISORY |
Read more on how to validate llms.txt compliance programmatically.
If your markdown analysis yields a low AI Readiness Score, you are likely violating one of the core token optimization rules. Review these common debugging workflows:
AI crawlers scan your llms.txt independently of browser route hierarchies. Relative paths like [API Reference](/docs/api) cannot be resolved correctly without browser state context. This will cause a 404 error during crawler ingestion.
Fix: Update the link to use your full canonical domain: [API Reference](https://yoursite.com/docs/api).
Without a blockquote prefix (> ) immediately following the H1, the parser cannot quickly extract the system prompt equivalent. It forces agents to parse the entire file to infer what your site does, which wastes compute cycles.
Fix: Add a descriptive line under your H1: > A developer platform for high-performance edge compute.
Adding inline markup tags like <br>, <strong>, or custom container cards increases raw file token density without providing structural value to LLMs. Some strict parsers will fail to read the file entirely.
Fix: Strip all tags and stick strictly to Markdown syntax. Use plain paragraph text or bullet points instead of HTML tables/breaks.
An optimal llms.txt file should remain under 10 KB (approx. 1,500 - 2,000 words). The file acts as a clean map or sitemap-equivalent for AI agents, not a bulk database. Move detailed, full-text documentation guides to llms-full.txt to conserve crawler token budgets.
Yes. Advisory warnings (like recommending an llms-full.txt link) will not break the parser. However, fixing them improves the overall Generative Engine Optimization (GEO) score of your file.
AI crawl agents read files asynchronously outside of your application frame. If they process a relative path (/about), they won't know the hostname of the server to request the sub-guide. Explicitly providing absolute URLs prevents resolution errors.
Lists are standard for grouping URLs. Plain markdown tables and headers are also permitted. However, avoid embedding images (). The primary goal is supplying high-density text files; models cannot "see" images via this index format.
The current text validator checks the syntax format of the links (ensuring they are absolute URLs). It does not initiate HTTP GET requests to ping every URL in your file to see if it is live. You must ensure your links are active before publishing.
The validator will flag this as a critical error. The standard specifies a single H1 (#) at the top of the file to declare the domain/project name. Any subsequent sections must use H2 (##) or H3 (###) tags for hierarchical chunking.
The validation process runs entirely in your browser using local JavaScript regex parsing. We do not transmit your markdown to our servers. For more details on our data handling, refer to our security and privacy guide.
Once you validate and host the file, place it at the root folder of your domain (https://yoursite.com/llms.txt). Popular AI search agents check this specific endpoint automatically before indexing deeper paths on your website.
GitHub Flavored Markdown (GFM) is extremely permissive and allows raw HTML, task lists, and complex table nesting because its goal is visual rendering. The llms.txt standard is strict because its goal is machine ingestion. What looks good visually might be terrible for tokenization.
You can paste snippets of your llms-full.txt here to check for HTML contamination, but the strict structural rules (H1 followed by Blockquote) primarily apply to the index file (llms.txt). The full text file is generally just concatenated content and is more forgiving.