Retrieval is only as good as the corpus behind it. If your knowledge base is a stale PDF dump, your agent answers from stale facts. Synoppy lets you build retrieval on top of the live web and keep it fresh on a schedule — here's the whole shape of it.
1. Discover the URLs
Start with Mapto get every URL on a domain without reading each page — the cheapest way (1 credit) to scope what you'll ingest.
import { Synoppy } from "@synoppy/sdk";
const client = new Synoppy({ apiKey: process.env.SYNOPPY_API_KEY! });
const { urls } = await client.map("https://docs.yourtarget.com");2. Read each page as markdown
Pipe the URLs through Read (or use Crawl to do discovery and reading in one call). Markdown chunks cleanly on headings, which gives you tidy, semantically coherent chunks instead of arbitrary character windows.
const pages = await Promise.all(
urls.slice(0, 50).map((u) => client.read(u, { formats: ["markdown"] }))
);
const chunks = pages.flatMap((p) =>
p.markdown.split(/\n## /).map((c) => ({ url: p.metadata.sourceUrl, text: c }))
);3. Embed and store
Embed the chunks with your model of choice and upsert them into your vector store with the source URL as metadata, so every answer can cite where it came from.
4. Keep it fresh
Re-run the whole flow on a cron. Because the input is the live web — not a one-time export — your retrieval never quietly drifts out of date. Diff the new markdown against the last run to re-embed only what changed and keep costs flat.