How to Generate llms-full.txt Programmatically

Published: August 24, 2025 | Last Updated: September 27, 2025 | Read Time: 12 mins

Keeping your AI indexing files updated manually is unsustainable as your website grows. By programmatically compiling your pages, you ensure crawlers always index your latest content versions.

Key Takeaways

1. Ingestion Pipelines for AI Ingestion

A dynamic, machine-readable pipeline requires three stages: discovery (scanning sitemap lists), collection (fetching body contents), and purification (stripping CSS elements and script tags).

Automating this flow prevents outdated link pointers. You can schedule this process as a cron job on a reliable cloud hosting provider like DigitalOcean. Alternatively, use a markdown conversion service like Firecrawl to fetch clean markdown pages directly.

Automated Ingestion Pipeline

1. Parse Sitemap
2. Fetch Pages
3. HTML to MD
4. Write full.txt

2. Coding a Node.js Compilation Script

Let's look at a JavaScript script using standard dependencies to build your llms-full.txt. This script reads target URLs, strips boilerplate elements, and joins them using markdown line breaks.

import fs from 'fs';
import axios from 'axios';
import * as cheerio from 'cheerio';
import TurndownService from 'turndown';

const turndown = new TurndownService();

async function compileFullText(urls) {
  let output = `# Project Documentation Corpus\n\n`;

  for (const url of urls) {
    try {
      const res = await axios.get(url);
      const $ = cheerio.load(res.data);
      
      // Clean unnecessary UI elements
      $('nav, footer, script, style, noscript').remove();
      
      const cleanHTML = $('main').html() || $('body').html();
      const markdown = turndown.turndown(cleanHTML);
      
      output += `## Section: ${$('title').text()}\n`;
      output += `Source: ${url}\n\n`;
      output += `${markdown}\n\n`;
      output += `---\n\n`;
    } catch (err) {
      console.error(`Failed to crawl ${url}:`, err.message);
    }
  }

  fs.writeFileSync('./public/llms-full.txt', output);
}

3. Formatting Rules for llms-full.txt

When compiling your content databases, ensure the generated outputs comply with formatting guidelines. Keep links absolute, use H2 headers (##) for each page title, and separate entries using horizontal rules (---).

You can read about the differences between directory listing and full-text databases in llms-full.txt explained. To configure similar paths inside web builders, check out our reviews on llms.txt generator tools.

4. Testing Compilation Quality

After your compiler script generates the output, audit it to ensure it contains no broken markdown blocks or unescaped HTML elements. You can run automated tests using our free llms.txt validator to check for compilation issues.

Frequently Asked Questions

4.8
★★★★★
Rate this Content
22 Ratings