Really appreciate the detailed feedback. There are bunch of great features that you are pointing out that are on our roadmap (will add whats missing). The agent can setup tasks on schedule and help manage them. You can try a prompt like 'Can you schedule a background task xyz to run every morning ...'. The background tasks would show up on the UI once it is scheduled by the assistant. However, you might have to connect the necessary MCP tools in your case.
On Gmail actions - we currently don’t take write actions on inboxes like archiving or categorizing emails. The Google connection is read-only and used purely to build the knowledge graph. We’re working on adding write actions, but we’re being careful about how we implement them. Also probably why the agent was confused and was looking for an MCP to accomplish the same job.
On noise in the knowledge graph — this is something we’re actively tuning. We currently have different note-strictness levels that auto-inferred based on the inbox volume (configurable in ~/.rowboat/config/note-creation.json) that control what qualifies as a new node. Higher strictness prevents most emails from creating new entities and instead only updates existing ones. That said, this needs to be surfaced in the product and better calibrated. Using “people I send emails to” as a proxy for importance is a really good idea.
Good question. We don’t pass the entire graph into the model. The graph acts as an index over structured notes. The assistant retrieves only the relevant notes by following the graph. That keeps context size bounded and avoids dumping raw history into the model.
For contradictory or stale information, since these are based on emails and conversations, we use the timestamp of the conversation to determine the latest information when updating the corresponding note. The agent operates on that current state.
That said, handling contradictions more explicitly is something we’re thinking about. For example, flagging conflicting updates for the user to manually review and resolve. Appreciate you raising it.
> That said, handling contradictions more explicitly is something we’re thinking about.
That's a great idea. The inconsistencies in a given graph are just where attention is needed. Like an internal semantic diff. If you aim it at values it becomes a hypocrisy or moral complexity detector.
Interesting framing! We’ve mostly been thinking of inconsistencies as signals that something was missed by the system, but treating them as attention points makes sense and could actually help build trust.
As a corporate drone, keeping track of various internal contradictions in emails is the name of the game ( one that my former boss mastered, but in a very manual way ). In a very boring way, he was able to say: today you are saying X, on date Y you actually said Z.
His manual approach, which won't work if applied directly ( or more specifically, it will, but it would be unnecessarily labor intensive and on big enough set prohibitively so ), because it would require constant filtering re-evaluating all emails, can still be done though.
As for exact approach, its a slightly longer answer, because it is a mix of small things.
Since I try to track, which llm excel at which task ( and assign tasks based on those tracking scores ). It may seem irrelevant at first, but small things like: 'can it handle structured json' rubric will make a difference.
Then we get to the personas that process the request, and those may make a difference in a corporate environment. Again, as silly as its sounds, you want to effectively have a Dwight and Jim ( yes, it is an office reference ) looking at those ( more if you have a use case that requires more complex lens crafting ) as will both be looking for different things. Jim and Dwight would add their comments noting the sender, what they seem to try to do and issues they noted ( if any ).
Notes from Jim and Dwight for a given message is passed to a third persona, which will attempt to reconcile it noting discrepancies between Jim and Dwight and checking against other like notes.
...and so it goes.
As for flagging itself, that is a huge topic just by itself. That said, at least in its current iteration, I am not trying to do anything fancy. Right now, it is almost literally, if you see something contradictory ( X said Y then, X says Y now ), show it in a summary. It doesn't solve for multiple email accounts, personas or anything like that.
This was a really interesting read. Thanks for the detailed breakdown and the office references. The multi-persona approach is interesting, almost like a mixture of experts. The corporate email contradiction use case is not something we had in mind, but I can see how flagging those inconsistencies could be valuable!
Graphiti is primarily focused on extracting and organizing structured facts into a knowledge graph. Rowboat is more focused on day-to-day work. We organize the graph around people, projects, organizations, and topics.
One design choice we made was to make each node human-readable and editable. For example, a project note contains a clear summary of its current state derived from conversations and tasks across tools like Gmail or Granola. It’s stored as plain Markdown with Obsidian-style backlinks so the user can read, understand, and edit it directly.
Thanks. Obsidian and Logseq were definitely an inspiration while building this. What we’re trying to explore is pushing that a bit further. Instead of manually curating the graph and then querying it, the system continuously updates the graph as work happens and lets the agent operate directly on that structure.
Would love to know what kind of scripts or plugins you’re using in Logseq, and what you’re primarily using it for.
Thanks! Agent capabilities are getting commoditized fast. The differentiator is context. If you had a human assistant, you'd want them sitting in on all your meetings and reading your emails before they could actually be useful. That's what we're trying to build.
All the knowledge is stored in Markdown files on disk. You can edit them through the Rowboat UI (including the backlinks) or any editor of your choice. You can use the built in AI to edit it as well.
On background tasks - there is an assistant-skill that lets it schedule and manage background tasks. For now, background tasks cannot execute shell-commands on the system. They can execute built-in file handling tools and MCP tools if connected. We are adding an approval system for background tasks as well.
There are three types of schedules - (a) cron, (b) schedule in a window (run every morning at-most once between 8-10am), (b) run once at x-time. There is also a manual enable/disable (kill switch) on the UI.
That’s great to know. I’ve come to the same conclusion. I’ve found that things work best when they happen right where I’m already working. Uploading files or recreating context in a web service adds friction, especially when everything is already available locally.
Will check out Grok Code Fast - thanks for the pointer. In my experience, coding agents can swing a lot in quality depending on the model’s reasoning power. When the model starts making small but avoidable mistakes, the overhead tends to cancel out the benefit. Curious to see how Grok performs on multi-step coding tasks.
True. Im working with Python CRUD apps, which every model is fluent in. And I'm personally generating 100-line changes, not letting it run while I'm AFK.
That's what I love most about Claude. I love Django and I love React (the richness of building UIs with React is insane) and sure enough Claude Code (and other models I'm sure) is insanely good at both.
On Gmail actions - we currently don’t take write actions on inboxes like archiving or categorizing emails. The Google connection is read-only and used purely to build the knowledge graph. We’re working on adding write actions, but we’re being careful about how we implement them. Also probably why the agent was confused and was looking for an MCP to accomplish the same job.
On noise in the knowledge graph — this is something we’re actively tuning. We currently have different note-strictness levels that auto-inferred based on the inbox volume (configurable in ~/.rowboat/config/note-creation.json) that control what qualifies as a new node. Higher strictness prevents most emails from creating new entities and instead only updates existing ones. That said, this needs to be surfaced in the product and better calibrated. Using “people I send emails to” as a proxy for importance is a really good idea.