Building an AI Chat for the Nexus UI Docs

8 min read

Jun 2026

For the past couple of months, I've been building Nexus UI, an open-source component library for AI interfaces. The docs site was already in good shape. Every component has install steps and examples, and the search bar works well when you know what you're looking for.

What search can't do is answer questions. It can't tell you which components fit together for a specific pattern, or walk you through a follow-up like "okay, how does this component relate to that one?"

So I built Ask AI: a chat panel inside the docs where you can talk to the documentation directly. Press ⌘/ on any page, ask in plain language, and get a streamed answer with links back to the real docs.

Media: Screen recording of the docs page → hit ⌘/ → panel opens → ask "How do I add Prompt Input?" → answer streams in. This is the hook for the whole post.

Starting with search and a model

My intial approach was quite simple. FlexSearch on the docs, pipe the user's question to a model through OpenRouter, and then stream back a reply. I knew how to use the AI SDK from earlier work, and FlexSearch was already on the site for search. So it seemed building up the chat UI would be the hard part and the backend would be much quicker.

It wasn't. I noticed that the model's responses sounded great but were often wrong. The model would invent npm packages, guess import paths, and describe props that don't exist. This was mainly because the model knows nothing about Nexus UI, and so it could only assumed facts from the little it could search up. This was a big problem, and so I figured I needed to build a more robust system.

That's when I started reading about RAG. I learned that the better way to ground models is to augment their training data with external knowledge. I spent a lot of time in Cursor going back and forth, trying things, breaking things, until the responses actually tracked with what I'd written in the documentation.

The retrieval pipeline

I spent time building a RAG pipeline to stop the model from making things up. What I ended up with has three steps: chunk the docs into a searchable corpus, find the right chunks per question, inject them before the model answers.

Building the corpus

On server startup, I pull every docs page, split it by section heading, and chunk each section into ~1,000 character pieces. I also index component source files and a small hand-written facts block with the correct install command and import path.

Source code chunks help the model but shouldn't become links in answers. So each chunk has a citeable flag. Docs and facts are citeable. Source files are internal only. I also specified in the system prompt that links should be directed to /docs URLs, never to raw source paths.

Finding the right chunks

FlexSearch indexes the corpus in memory. For a component library where people literally ask about "Prompt Input" and there's a page called Prompt Input, keyword search gets you further than I expected. I'm planning on using embeddings for vaguer questions, but this was a fine starting point.

One bug that I ran into early was that I was only searching on the latest message. Follow-ups like "show me an example" retrieved nothing because the query was just that phrase, with no mention of the component we'd been discussing. I fixed this by concatenating all user messages in the thread to form the search query:

After search, I rank hits and cap how many chunks any single page section can contribute.

Getting it in front of the model

What mattered here the most was that retrieval happens on every message, before the model runs. Not as an optional tool called only when the model asks for it. Every time.

I'd initially given the model a search tool and hoped it would use it. It often didn't. It would answer from memory, sound confident, and be wrong. Injecting context upfront fixed that. The search tool still exists, but as a backup for when the first pass misses something or the conversation pivots. Most of the time pre-retrieval was enough and the answer just streamed in.

Media: Screenshot of an answer with citation badges / source links. Optional: short clip showing a search step appear mid-answer (ChainOfThought), then the reply continuing.

The UI, built from the library

I'm a frontend-leaning engineer, so this was the fun part. The chat is built from Nexus UI components: Thread and Message for the conversation, PromptInput for the input, Suggestions for starter prompts, Citation for source links, ChainOfThought when a search runs, Toaster for feedback. I also used shadcn/ui underneath for buttons, tooltips, the chat shell (Sheet on desktop, full dialog on mobile).

The docs chat is a working example of what Nexus UI components look like when used in a real product. Building this revealed a few rough edges I wouldn't have caught from writing the doc examples alone.

Media: Side-by-side screenshot — docs page with panel open on desktop. Empty state with suggestion chips. Close-up of the prompt input showing the 12/30 usage counter.

Rate limiting on a public site

The docs chat obviously has no authentication, so I had no user account to attach limits to. Any visitor could decide to overuse the chat which would cost me money. I had to add guardrails after the chat itself worked.

I store daily limits per IP in Upstash Redis, resetting at midnight UTC. On Vercel, I read the real client IP from a forwarded header. When I can't resolve an IP, the client falls back to a random ID from localStorage—with a lower cap, since it's easy to reset.

If Redis isn't configured in production, I return 503 instead of silently skipping limits. I'd rather the chat be briefly down than run unmetered.

I also show the remaining messages in the prompt input so the limit isn't a surprise. When someone hits the cap, I show a warning toast. I capped messages at 1,000 characters on the client; I'd probably still need to enforce that server-side.

What's next

There are a few things that could be improved: embeddings for better semantic search, trimming conversation history so long threads don't get expensive, and server-side input validation.

The chat is live at nexus-ui.dev/docs. Open it, press ⌘/, and ask something. The source code is available on GitHub.

Keep building!