/* WHY MULTIMODAL — the pitch, driven by real MMLongBench results in /why-multimodal.json. Each card is a question whose answer lives in a figure: text_pages = the text retriever's top hits (gold page absent); router_pages = the router's top hits (gold page present). */ function PageChips({ pages, gold }) { return ( {pages.length === 0 ? nothing on-topic : pages.map((p, i) => ( p{p} ))} ); } function WhyView({ setTab, routingAvailable }) { const [data, setData] = useState(null); const [active, setActive] = useState(0); useEffect(() => { fetch("/why-multimodal.json") .then((r) => (r.ok ? r.json() : { cards: [] })) .then(setData) .catch(() => setData({ cards: [] })); }, []); const cards = (data && data.cards) || []; const card = cards[active]; const inList = (g, list) => g.some((x) => list.includes(x)); const textHit = cards.filter((c) => inList(c.gold_pages, c.text_pages)).length; const routerHit = cards.filter((c) => inList(c.gold_pages, c.router_pages)).length; const pct = (n, d) => (d ? Math.round((n / d) * 100) : 0); return (
A large share of a document's answers live where a text chunker never looks — leaderboard tables, architecture diagrams, values printed inside charts. Embed only the body text and those answers are simply not in the index. SpectraRAG indexes the page images too and routes each question to the store that actually holds the answer. Every example below is a real MMLongBench question whose answer sits in a figure.
Gold page p{card.gold_pages[0]} ({card.figure_label}) is not in the text retriever's top hits — the answer is printed in the figure, which never enters the text index. The model has no grounding for it.
The router flags a figure-bound query, searches the visual store, and pulls gold page p{card.gold_pages[0]}. The model reads the answer off the {card.figure_label}.
Across these {cards.length} figure-bound examples, the gold page — the one where the answer actually appears — lands in the text retriever's top-10 {textHit}/{cards.length} times. The router recovers it {routerHit}/{cards.length}. Same query, same corpus, different store.
A lightweight router reads the turn (plus prior context) and predicts whether the answer is likely text-bound, figure-bound, or both.
Text routes to a dense bge-m3 passage index; figure-bound queries also pull the page images so the answer's page is in context.
A cross-encoder reranks the candidates, and the answer cites the exact chunk or page each claim came from.
This deployment runs text-side (the router needs a GPU), but every stage — retrieval, reranking, evidence — traces in real time. Figure questions still read the page images.
Ask a figure-bound question and watch the retrieval panel pick the page.