Skip to content

feat(public-docsite-v9): add llms docs #34838

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 29 commits into
base: master
Choose a base branch
from

Conversation

dmytrokirpa
Copy link
Contributor

@dmytrokirpa dmytrokirpa commented Jul 15, 2025

Previous Behavior

New Behavior

This PR introduces a new CLI tool that extracts documentation from Storybook builds and converts it to LLM-friendly formats following the llmstxt.org specification. The tool processes Storybook production builds to generate comprehensive documentation in plain text format optimized for Large Language Models.

Key Features

  • Component Documentation: Extracts props, descriptions, and type information from React components
  • Story Examples: Captures all story variations with complete source code
  • MDX Support: Processes MDX documentation pages and converts HTML to clean Markdown
  • Subcomponents: Handles complex components with subcomponents and their props
  • LLMs.txt Format: Generates summary files following the llmstxt.org specification
  • Static File Serving: Uses Playwright routing instead of Express for better reliability
  • Flexible Configuration: Supports CLI arguments and config files

Technical Implementation

  • Static File Routing: Uses Playwright's page.route() to serve Storybook files without needing a web server
  • Story Extraction: Accesses Storybook's internal story store (__STORYBOOK_PREVIEW__) for metadata
  • Content Processing: Converts HTML documentation to clean Markdown using Turndown with GitHub Flavored Markdown support
  • Storybook Compatibility: Supports both Storybook 7 (storyStore) and Storybook 8+ (storyStoreValue)

Output Structure

storybook-static/ 
├── llms.txt # Main summary file (llmstxt.org format) 
└── llms/ 
├── components-button.txt # Individual component docs 
├── components-accordion.txt 
└── concepts-introduction.txt # MDX page docs

Usage Examples

Basic Usage:

npx storybook-llms-extractor --distPath "storybook-static" --baseUrl "https://storybook.example.com"

# or with refs

npx storybook-llms-extractor \
  --distPath "storybook-static" \
  --baseUrl "https://main.storybook.dev" \
  --refs '{"title":"Charts","url":"https://charts.storybook.dev"}'

With Configuration File:

// storybook-llms.config.js
// @ts-check

/** @type {import('@fluentui/storybook-llms-extractor').Args}
module.exports = {
  distPath: 'storybook-static',
  baseUrl: 'https://react.fluentui.dev',
  summaryTitle: 'Fluent UI React v9',
  summaryDescription: 'Fluent UI React components documentation',
  refs: [
    { title: 'Charts v9', url: 'https://charts.fluentui.dev' }
  ]
};

Files Added

  • tools/storybook-llms-extractor/src/cli.ts - CLI entry point and argument processing
  • tools/storybook-llms-extractor/src/utils.ts - Core extraction and conversion logic
  • tools/storybook-llms-extractor/src/types.ts - TypeScript type definitions
  • tools/storybook-llms-extractor/src/index.ts - Package exports
  • tools/storybook-llms-extractor/src/utils.spec.ts - Unit tests
  • tools/storybook-llms-extractor/src/__fixtures__/ - Test fixtures
  • tools/storybook-llms-extractor/README.md - Comprehensive documentation

Copy link

github-actions bot commented Jul 15, 2025

📊 Bundle size report

✅ No changes found

@github-actions github-actions bot added the CI label Jul 15, 2025
Copy link

Pull request demo site: URL

@tudorpopams
Copy link
Contributor

This is awesome! Can we apply the same pattern to composed stories as well? It would be really cool to get this done for charts and contrib as well.

baseUrl: argv.baseUrl,
summaryTitle: argv.summaryTitle,
summaryDescription: argv.summaryDescription,
refs: parseRefs(argv.refs),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this whole logic could be simplified as we can guarantee what yargs parse will provide

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it works well with primitive types, but I had some issues with objects, that's why this is needed. argv.refs has string|number[] type


const stories: StorybookStoreItem[] = await page.evaluate(async () => {
// @ts-expect-error - Storybook Client API is not typed
await window.__STORYBOOK_CLIENT_API__.storyStore.cacheAllCSFFiles();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this wont work starting sb v8

we will need to add similar feature flagged behaviours like we did for storywright, which is also official public API

https://github.com/microsoft/storywright/pull/74/files#diff-e056b3ef14d67b65de77e3e846aa3bf75c699e5f4b60c6754c83766a306152afR38

this returns different metadata as currently used private api so needs to be doublechecked if this is feasible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__STORYBOOK_PREVIEW__.extract() doesn't provide all data we need, but we could use __STORYBOOK_PREVIEW__.storyStore which is available for both SB 7 and 8+

@dmytrokirpa
Copy link
Contributor Author

This is awesome! Can we apply the same pattern to composed stories as well? It would be really cool to get this done for charts and contrib as well.

https://fluentuipr.z22.web.core.windows.net/pull/34838/public-docsite-v9/storybook/llms.txt - v9 llms.txt
https://fluentuipr.z22.web.core.windows.net/pull/34838/chart-docsite/storybook/llms.txt - charts llms.txt

Contrib isn't ready yet since it's not in the monorepo, and I'm still figuring out the optimal way to distribute the script if we'll decide to go with it.

@Hotell
Copy link
Contributor

Hotell commented Jul 17, 2025

distribution:

i don't see how this could possible work as SB addon or bundler plugin because how this works under the hood.

it's very similar to what storywright does for obtaining screenshots, which is actually desired behaviour as it makes the tool atomic and re-usable.

While the implementation is tightly coupled to our full source addon, it shouldn't coupled as a pre-requirement to have - thus having a graceful behaviour, if full source exists we process that code otherwise standard sb code.

  • naming of the CLI package, something like: StorybookLLMextractor feels appropriate

storybook composition:

this approach won't scale outside repo linked SB, thus the approach here should be that it's responsibility of linked(composed) SB to generate the markdown assets as part of their production builds

@dmytrokirpa
Copy link
Contributor Author

dmytrokirpa commented Jul 17, 2025

Thanks for the feedback @Hotell!

distribution:

i don't see how this could possible work as SB addon or bundler plugin because how this works under the hood.

it's very similar to what storywright does for obtaining screenshots, which is actually desired behaviour as it makes the tool atomic and re-usable.

While the implementation is tightly coupled to our full source addon, it shouldn't coupled as a pre-requirement to have - thus having a graceful behaviour, if full source exists we process that code otherwise standard sb code.

That makes sense.

  • naming of the CLI package, something like: StorybookLLMextractor feels appropriate

Agree, do you think it should live in the core monorepo or as a standalone repo?

storybook composition:

this approach won't scale outside repo linked SB, thus the approach here should be that it's responsibility of linked(composed) SB to generate the markdown assets as part of their production builds

That's exactly how it works atm, we use the refs cli arg to only include links to external (composed storybooks) in llms.txt, their assets generated as part of their production builds

@dmytrokirpa dmytrokirpa requested a review from Hotell July 28, 2025 14:53
@dmytrokirpa dmytrokirpa marked this pull request as ready for review July 28, 2025 15:58
@dmytrokirpa dmytrokirpa requested review from a team as code owners July 28, 2025 15:58
@Hotell
Copy link
Contributor

Hotell commented Jul 29, 2025

Agree, do you think it should live in the core monorepo or as a standalone repo?

lets stick in core repo for now for logistic and distribution simplicity, in future it might make sense to create a new fluent-storybook-addons repo or something alike

Copy link
Contributor

@Hotell Hotell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking great !

  • added some commens/actionables ( mainly the SB api simplification / encapsulation )

A thing for thought:

  • with this approach it's a black box that might come as a surprise what the deployed output will be. maybe we should consider actually storing the .txt generation in git and force to re-generate if content changes ( similarly like we have for JSXIntrinsicElement in react-utilities )

demandOption: true,
describe: 'Relative path to the Storybook distribution folder',
})
.option('baseUrl', {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit confusing, I thought this is used to actually fetch metadata from that origin but its only purpose is text emit in the .txt file

can we improve the docs or rename the property to something more meaningful ?

Comment on lines 82 to 86
.option('baseUrl', {
type: 'string',
default: '/',
describe: 'Base URL for the Storybook docs',
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.option('baseUrl', {
type: 'string',
default: '/',
describe: 'Base URL for the Storybook docs',
})
.option('summaryBaseUrl', {
type: 'string',
default: '/',
describe: 'Storybook deployed URL for the summary docs',
})

## Requirements

- Node.js 16+
- Storybook 7+ (supports both Storybook 7 and 8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new approach should cover also v9 correct ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve tested this with v7 and v8 for Fluent core and extensions, and will check v9 next.

@dmytrokirpa
Copy link
Contributor Author

  • with this approach it's a black box that might come as a surprise what the deployed output will be. maybe we should consider actually storing the .txt generation in git and force to re-generate if content changes ( similarly like we have for JSXIntrinsicElement in react-utilities )

That's a valid point about controlling the output, but it would mean core devs need to build a full docsite locally with every component story update PR, right?

@dmytrokirpa dmytrokirpa requested a review from Hotell July 29, 2025 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants