I don't have any background as an analyst or anything like that. ACH is a real tool, really used by the CIA, and the existing versions are basically crappy spreadsheets, or not free, or both.
I don't doubt someone with coding skills could do it better, it's just that no one else has stepped up. Probably because there's no profit angle, but that's conjecture on my part.
I didn't think a video would be very exciting. It did feel like deep magic when I tested it though. For the scenario in the screenshots, I provided the question, "Did we really land a man on the moon?" and the null hypothesis "We landed on the moon in 1969", and the low value piece of evidence "My dad told me he saw Stanley Kubrick's moon landing set one time and he never lies." Literally everything else the LLM generated on demand for me based on its existing training data, offline. It gave me hypotheses, challenges, evidence, filled out the matrix, did the calculations, everything.
It would be enough to drive most local LLMs crazy if it tried to generate it all at once or if it was all part of one long session, but it's set up so the LLM doesn't have to produce much at a time. I only batch in small groups (like it will generate only 3 suggestions per request) and the session is refreshed between calls, and the output is generally force structured to fit correctly into the expected format. You can, however, ask for new batches of suggestions or conflicts or evidence more than once. Hallucinations can happen for any LLM use of course, but if they break the expected structure the output is generally thrown out. Even the matrix scoring suggestion - it works on the whole row, but behind the scenes the LLM is asked to return one response in one "chat" session per column, and then they are all entered at the same time once all of them have been individually returned. That way, if the LLM does hallucinate for the score, it outputs a neutral response for that cell and doesn't corrupt any of the neighboring cells.
If you use a smaller model with smaller context, it might be more prone to hallucinations and provide less nuanced suggestions, but the default model seems to be able to handle the jobs pretty well without having to regenerate output very often (it does happen sometimes, but it just means you have to run it again.) Also, depending on the model, you might get less variety or creativity in suggestions. It's definitely not perfect, and it definitely shouldn't be trusted to replace human judgement.
Well, based on the evidence provided against our competing hypotheses, The least problematic hypothesis is that we landed on the moon in 1969. Second least problematic hypothesis was "The Apollo 11 mission was a hoax staged by NASA and the U.S. government for public relations and Cold War propaganda, but the moon landing itself was real — only the public narrative was fabricated." Third least problematic was "The Apollo 11 mission was a real event, but the moon landing was not achieved by humans — it was an automated robotic mission that was misinterpreted or falsely attributed to astronauts due to technical errors or media misreporting." - The winning hypothesis had a score of 0 (lower is better), second place had a score of 6 (out of possible 10 for our evidence set), and third place had a score of 8. There was also a tie for 4th place "It was just a government coverup to protect the firmament. There is no "outer space."" and "The Apollo 11 mission never occurred; all evidence — including photos, video, and lunar rocks — was fabricated in secret laboratories using early 20th-century special effects and staged experiments, possibly by a small group of scientists and engineers working under government contract." - both of these scored 10 out of 10, making them the most problematic. Sorry guys.
I'm sure if the right evidence were submitted and run against the right hypotheses a different frontrunner could emerge. Remember - this is a tool to help you investigate better and figure out what to look for, not a tool that tells you the answer. It helps you eliminate unlikely answers more than it ever points at the "right" answer, and even the most unlikely answers can still be the "right" ones! Hang in there
Probably - LLMs definitely benefit from having decision-making frameworks. ACH is a super-widely useful tool, so I don't see why you couldn't tune an AI with it too.
The cheesy noir persona is for the AI assisted install and that's it. Inside the app, the prompts are strictly business. (They still have roles, but not "characters" or "personas").
I got tired of expensive SaaS tools that want my sensitive documents in their cloud. I built ArkhamMirror to do forensic document analysis 100% locally, free and open source.
What makes this different:
Air-gapped: Zero cloud dependencies. Uses local LLMs via LM Studio (Qwen, etc.)
ACH Methodology: Implements the CIA's "Analysis of Competing Hypotheses" technique which forces you to look for evidence that disproves your theories instead of confirming them
Corpus Integration: Import evidence directly from your documents with source links
Sensitivity Analysis: Shows which evidence is critical, so if it's wrong, would your conclusion change?
The ACH feature just dropped with an 8-step guided workflow, AI assistance at every stage, and PDF/Markdown/JSON export with AI disclosure flags. It's better than what any given 3-lettered agency uses.
However, instead of going through locally hosted docker and local LLMs, you could reroute it wherever you like, but I don't have a cloud option set up at this time.
I'm focused on the developing the local, private applications myself, but nothing is stopping someone from hooking it up to stronger cloud-based stuff if they want.
The good news is that my plans for this include making it more modular, so people have better options for what it does and how powerful it is.
There is now a standalone ACH tool you can try online in-browser. You can download it and run it locally with local llm, or you can use it in the browser and plug in a Groq or OpenAI API key.
It's not just for people doing interesting things. It just helps people answer questions about stuff. The stuff can be interesting or boring or dangerous or silly. The last question I tested the ACH tool on was "Did William Shakespeare really author all of the works he was credited for?" - You can use this stuff to research whatever you want. That's the point of it - it's no one's business what you are interested in getting to the bottom of.
I can say, from a business perspective, I've needed to use similar methodologies, though far from needing air-gap requirements and relying heavily on web search, to evaluate potentially fraudulent transactions and relationships between parties.
What are the competing hypotheses, other than fraud, when a person makes a massive luxury purchase, but with red-flag-adjacent inconsistencies in other information provided? If we need to identify whether there's a weird or competitive ownership relationship behind a potential opportunity, how do we determine if an initial hypothesis about relationships is correct?
If ArkhamMirror has an online mode with web search as a tool call, I'd be curious to try it out to automate some of these ACH-adjacent workflows.
It doesn't have an online mode yet - although there's a lot of stuff in the works.
However, since docker and LM Studio are already included in the setup, you can turn on MCP Toolkit in Docker and add the Docker MCP to LM Studio.
With Docker Toolkit on, you get access to over 300 different MCPs for your local LLM including web search via DuckDuckGo or Brave Search, automation tools like n8n, web manipulation stuff with playwright, and all sorts of potentially useful stuff. (not a sponsor :P) Then your "local" LLM can suddenly do all sorts of agential stuff.
This isn't out-of-the box capability, since I'm only building offline, local, privacy-focused features at the moment, but turning it on isn't a huge undertaking.
If you are up for messing with some prompts in the files, you could even specify to the LLM what tool you want it to use for which task if it's not automatically using them when the need arises.
Description on the repo says it's for journalism, but I build similar rigs that I use for research in companies that have entered bankruptcy proceedings.
Commonly there is a lot of information and it might as well be unstructured, and then I need to get answers quickly because my clients aren't going to pay me for going about it slowly.
It's mainly useful for journalism purposes, yes. Audit and compliance uses were also a consideration. It's a unified tool for right now, but I'm working on turning the base of it into the frame and adding individual shards for specialized applications.
Let me tell you this - This version of the toolkit is pretty monolithic and reflex is kind of a pain to work with for me. This version of the tool will be polished from here, but I hesitate to add more features to it since it already has like 35 pages of features.
I'm about to release another version of the tool that's focused on modularity, so you anyone can mix and match the features they want instead of having to take the whole thing or nothing. ACH is going to be the first addon thing added, followed by the rest of the features.
Notice the "Knowledge Graph" feature that lets you "Visualize hidden connections between People, Orgs, and Places" just like the cork board meme.
This is the essence of what good "conspiracy theorists" do. Whenever investigative journalists uncover a conspiracy among the elite, they are talked down to and dismissed as "conspiracy theorists". But that is what good conspiracy theorists are: investigative journalists.
For sure - "conspiracy theorists" are just another group of people trying to find truth, patterns in the world and trying to connect the dots.
The cork board feel was very much intentional in some of the visualizations.
Specifically, the "lie web" visualization that uses "red yarn" visuals to connect detected contradictions across different entities and documents.
If I had the skills, I would totally map that onto a cork board.
GitHub Discussions are open for questions, bug reports, and “here’s how I used it on a real investigation” stories.
Early testers: Drag the tutorial files into the app, ask 'Who is Captain Silver?', and watch it link a handwritten note to an email payment. No setup hell—just works offline.
I don't have any background as an analyst or anything like that. ACH is a real tool, really used by the CIA, and the existing versions are basically crappy spreadsheets, or not free, or both.
I don't doubt someone with coding skills could do it better, it's just that no one else has stepped up. Probably because there's no profit angle, but that's conjecture on my part.