The pain point: a confident lie is worse than “I don’t know”#
If you’ve ever used an LLM for research, you’ve been burned by the same thing I have: a beautifully written paragraph with a citation that looks perfect—author, year, page number—and is completely fabricated. The model didn’t lie on purpose; it pattern-matched what a citation looks like. But the result is the same. A tool that invents references isn’t a flawed research assistant, it’s an actively dangerous one, because the failure is invisible until someone checks. And the whole point of using the tool was to not have to check.
And this isn’t a hypothetical that careful people avoid—it’s already in the published literature. A recent audit, Compound Deception in Elite Peer Review: A Failure Mode Taxonomy of 100 Fabricated Citations at NeurIPS 2025, found 100 AI-generated hallucinated citations in papers accepted at NeurIPS—across 53 published papers, roughly 1% of all acceptances. Every one of them slipped past 3–5 expert reviewers, and two-thirds were total fabrications: the cited work simply does not exist. If the most prestigious review process in AI can’t catch these, “just check your citations” is not a solution. The paper’s own conclusion points the same way mine did: verification has to be automated and structural, not left to vigilance.
That observation is the seed of OpenMyst. The goal was never “a better chatbot for research.” It was one stubborn requirement: the system must be physically incapable of citing something that doesn’t exist. Not catching fabricated citations after the fact—making them unrepresentable in the first place.
The idea: anchor every claim to a verbatim line#
OpenMyst’s answer is to invert the usual flow. Instead of generating text and then attaching citations after the fact, it ingests your sources first and breaks each one into typed anchors—verbatim snippets (a claim, a statistic, a finding, a quote) each tagged with its exact source and line number. Those anchors are the only things the drafter is allowed to cite. No anchor on disk means the claim cannot be written. Hallucinated references aren’t filtered out after generation—they’re impossible by construction.
The nice property here is that “no hallucinated citations” stops being a thing you hope the model does and becomes a structural guarantee. Every [n] in the output dereferences to a real line you can hover and read in the original. It’s the same conviction that shows up in my other work—I’d rather have a deterministic, checkable source of truth than a confident model I have to babysit.
Two surfaces: the MCP connector and the full app#
OpenMyst ships in two forms, for two different ways people actually work.
The MCP connector (the main way it’s used now)#
The anchored-research engine is exposed as an MCP connector—so any MCP-capable client (Claude, or any frontier model with tool access) can call a single research tool and get back numbered, verbatim, line-cited evidence. The orchestration logic for this is built on LangGraph: the research loop—propose queries, fetch, filter, digest into anchors, return—is a graph of steps the host model can drive, calling small and often rather than dumping one giant batch.
This turned out to be the most useful form. With frontier models as strong as they are, the host already supplies the reasoning, the drafting, and the agent loop. What it doesn’t have is a disciplined, anchored view of the literature. So the durable value isn’t the wrapper around the model—it’s the retrieval layer underneath. The MCP is exactly that layer, and nothing else. (In fact, this very blog was researched through it.)
Web fetching and page-to-markdown conversion run through our own in-house pipeline—no third-party reader dependency—so the whole path from “a URL” to “an anchored snippet” is ours end to end.
The full application (the 11-agent extension)#
The desktop app is the richer, opinionated surface—for when you want the tool to help you think, not just retrieve. Its heart is Deep Plan, a structured pre-writing loop driven by a panel of 11 role-specific agents—an Explorer, a Skeptic, a Steelman, an Architect, an Adversary, an Audience, and others—each interrogating your work through its own lens.
The clever part is the orchestration. Running 11 personas every round would be noise, so a strong-model Chair sits above them, selects the two or three sharpest prompts each round, and surfaces them with attribution (“the Skeptic asks…”). The panel even cross-examines you against your own sources (“your wiki says X—does that change your stance?”). The output of all this deliberation is a dense vision.md—the intellectual spine the drafter then turns into a one-shot, fully-anchored draft. It’s a real multi-agent system, shipped in a production app with auth, billing, and crash-safe state—not a demo.
Why it’s free#
I’m not trying to make money from OpenMyst. It exists because hallucinated citations are a genuine pain point in research, and I wanted the thing to exist—for me, and for anyone else who’d find it useful. It’s open-sourced to find collaborators rather than customers. If the anchored-research idea is good, the right outcome is that it spreads, not that it’s locked behind a paywall.
Try it#
The app and the MCP connector both live at openmyst.ai, and the code is on GitHub. If you do research with LLMs and you’re tired of checking every citation by hand, the MCP connector is the fastest way to feel the difference—point your client at it and watch every claim come back with a line number attached.
