Synoppy v1.0 is here— start free
Blog
Guide

Building a RAG pipeline on Synoppy

Map a site, read every page as markdown, chunk, embed — always-fresh retrieval in a handful of calls.

The Synoppy team
Jun 16, 2026 · 8 min read

Retrieval is only as good as the corpus behind it. If your knowledge base is a stale PDF dump, your agent answers from stale facts. Synoppy lets you build retrieval on top of the live web and keep it fresh on a schedule — here's the whole shape of it.

1. Discover the URLs

Start with Mapto get every URL on a domain without reading each page — the cheapest way (1 credit) to scope what you'll ingest.

typescript
import { Synoppy } from "@synoppy/sdk";
const client = new Synoppy({ apiKey: process.env.SYNOPPY_API_KEY! });

const { urls } = await client.map("https://docs.yourtarget.com");

2. Read each page as markdown

Pipe the URLs through Read (or use Crawl to do discovery and reading in one call). Markdown chunks cleanly on headings, which gives you tidy, semantically coherent chunks instead of arbitrary character windows.

typescript
const pages = await Promise.all(
  urls.slice(0, 50).map((u) => client.read(u, { formats: ["markdown"] }))
);

const chunks = pages.flatMap((p) =>
  p.markdown.split(/\n## /).map((c) => ({ url: p.metadata.sourceUrl, text: c }))
);

3. Embed and store

Embed the chunks with your model of choice and upsert them into your vector store with the source URL as metadata, so every answer can cite where it came from.

4. Keep it fresh

Re-run the whole flow on a cron. Because the input is the live web — not a one-time export — your retrieval never quietly drifts out of date. Diff the new markdown against the last run to re-embed only what changed and keep costs flat.

Scope with Map before you Crawl. Crawl bills per page it actually reads, so checking the URL count first keeps a 10,000-page domain from turning into a surprise bill.

More from the blog

Give your agents the whole web

Read, crawl, map, extract, enrich, classify, and images are live today — all on one key. Agent actions are on the way. Build the next thing on Synoppy.