/* WHY MULTIMODAL — the pitch, driven by real MMLongBench results in /why-multimodal.json. Each card is a question whose answer lives in a figure: text_pages = the text retriever's top hits (gold page absent); router_pages = the router's top hits (gold page present). */ function PageChips({ pages, gold }) { return ( {pages.length === 0 ? nothing on-topic : pages.map((p, i) => ( p{p} ))} ); } function WhyView({ setTab, routingAvailable }) { const [data, setData] = useState(null); const [active, setActive] = useState(0); useEffect(() => { fetch("/why-multimodal.json") .then((r) => (r.ok ? r.json() : { cards: [] })) .then(setData) .catch(() => setData({ cards: [] })); }, []); const cards = (data && data.cards) || []; const card = cards[active]; const inList = (g, list) => g.some((x) => list.includes(x)); const textHit = cards.filter((c) => inList(c.gold_pages, c.text_pages)).length; const routerHit = cards.filter((c) => inList(c.gold_pages, c.router_pages)).length; const pct = (n, d) => (d ? Math.round((n / d) * 100) : 0); return (
THE PROBLEM

Most RAG can't read the figure.

A large share of a document's answers live where a text chunker never looks — leaderboard tables, architecture diagrams, values printed inside charts. Embed only the body text and those answers are simply not in the index. SpectraRAG indexes the page images too and routes each question to the store that actually holds the answer. Every example below is a real MMLongBench question whose answer sits in a figure.

{card && (
QUESTION “{card.question}”
{cards.length > 1 && (
{cards.map((c, i) => ( ))}
)}
Text-only retrieval
top-10 pages:

Gold page p{card.gold_pages[0]} ({card.figure_label}) is not in the text retriever's top hits — the answer is printed in the figure, which never enters the text index. The model has no grounding for it.

gold page missed
SpectraRAG router
{card.figure_label}
top-10 pages:

The router flags a figure-bound query, searches the visual store, and pulls gold page p{card.gold_pages[0]}. The model reads the answer off the {card.figure_label}.

grounded · answer: {card.answer}
)} {/* Numbers come from why-multimodal.json — render nothing rather than "0/0 examples" while it loads or if the fetch fails. */} {cards.length > 0 &&
MMLONGBENCH · FIGURE-BOUND ITEMS

Routing recovers the gold page that text-only retrieval drops.

Across these {cards.length} figure-bound examples, the gold page — the one where the answer actually appears — lands in the text retriever's top-10 {textHit}/{cards.length} times. The router recovers it {routerHit}/{cards.length}. Same query, same corpus, different store.

Text-only · {textHit}/{cards.length}
+ router · {routerHit}/{cards.length}
gold page present in top-10 · {cards.length} figure-bound MMLongBench items
}
HOW ROUTING WORKS

One gate, two stores, per turn.

01

Classify the query

A lightweight router reads the turn (plus prior context) and predicts whether the answer is likely text-bound, figure-bound, or both.

02

Retrieve from the right store

Text routes to a dense bge-m3 passage index; figure-bound queries also pull the page images so the answer's page is in context.

03

Rerank & cite

A cross-encoder reranks the candidates, and the answer cites the exact chunk or page each claim came from.

{routingAvailable === false ? (

See the retrieval pipeline live.

This deployment runs text-side (the router needs a GPU), but every stage — retrieval, reranking, evidence — traces in real time. Figure questions still read the page images.

) : (

See it route in real time.

Ask a figure-bound question and watch the retrieval panel pick the page.

)}
); } window.WhyView = WhyView;