I was frustrated with the Claude Code scrollback bug, the one that often has your whole chat history keep rewinding, flashing and scrolling from the beginning, often multiple times per second, spanning sometimes over a minute or longer, so I build bukowski, a soft terminal wrapper that acts like a pager interface to Claude Code, Codex and Gemini.
After the logic of capturing and double buffering the CC's output resulted in a decrease of flicker of about 97.5%, I created a FIPA ACL messaging MCP bridge for all three CLI tools and wrapped it in an IRC-like chat interface. Now all three tools can communicate with each other and this works surprisingly well if you give them roles or parts of tasks.
It's all local to the terminal interface, no remote servers, no API keys, just one wrapper for local terminal multiplexing and inter agent communication.
I did try using tmux in VS Code's terminal, but still managed to hit this issue with very large scroll buffers. Bumping tmux's internal scrollback buffer capacity to a 50-100k would delay the issue, but it would still eventually occur. In the last few CC versions they did get rid of the old buffer after e.g. conversation compaction, so that's maybe why you don't experience the issue anymore. An Anthropic engineer had even done a PR on tmux's GitHub repo to implement the OSC10/11 shennanigans, so jt seems tmux is not immune to it at all.
If you check out the GitHub issues on the Claude Code repo and sort by comments, you'll see that the first page is littered with the same issue you're talking about.
The answer seems to be in the vendored ink.js, at least in Claude Code's case. I actually asked Claude Code to look into its own minified and abfuscated source code and pointed it to ink.js's repo to try and deduce how and why the bug occurs; the easiest way to reproduce is to already have a few hundred to a couple of thousand lines of chat output ready and enter enough newlines that the height of the inputed text is > the height of the current terminal. But not only input box, but also other ink.js "widgets" have the same problem: everything gets redrawn over and over again (each color change of the flabbergasting... status re-triggers it, forcing terminals that don't handle OSC10/11 codes properly to rescroll from the beginning... or at least that's my interpretation).
Anyway, after a few flabbergastings of its own and my suggestion to make a wrapper around its output via xterm.js to always render a sliding fixed window of terminal_width x terminal_height and manage its scroll internally, as well as mapping the mouse wheel to scrolling / panning horizontally this virtual viewport and I had a basic, scroll-bug-free, 650 LOC node.js app that basically circumvents this issue completely, plus works in Windows Terminal and VS Code terminal. I've been using this as a drop in replacement since about a week or so and the only thing missing is a bit more draw call waiting on terminal resize since it triggers a reflow with visible scroll, but much, much faster than the annoying bug (sub 100ms).
As always, the feature creep is strong, there are som vim bindings now, and Claude, Gemini and Codex are currently working together (I have multiple terminal planes inside the app a la tmux) on implementing FIPA ACL inter agent communication via built-in MCP chat server. You can check it out at github.com/vmitro/bukowski and don't worry about the license, it's satire, it's basically free for any use (and besides, Paddle won't let me set up the $1M "Premium Luxury Enterprise License for Companies valued > $1B that Make non-open-source AI Tools"...)
Thank you for the tool suggestion, it sounds great, I'll take a look!
At this point I was thinking about setting up https://opencode.ai/ - I think Claude Pro (aka $20 per month) allows it?
I don't want API access (I think), because that's priced per actual usage and I want a capped subscription cost, and I'm fairly sure OpenCode needs API access.
I really want to use the official clients but they're just SO bad.
Which, if you think about it, is the perfect anti-advertisement for AI coding. If the prestige AI shops can't create almost perfect clients for their own tools, then how good really is AI for coding???
Don't laugh, but I think in the (near) future, more and more accent will be put on HITL concept as private or selfhosted AI workflows gain on interest; it's hard not to (hope for?) an emergence of movement similar to GNU in the space of software itself, where freely available tooling allows for collaborative, federated HITL powered finetuning of ML models.
As I do also work on a similar concept, where HITL is the first class citizen, can you tell us a bit more about the underlying technology stack, if it's possible for users to host their own models for inference and fine tuning, how are pipelines defined and such?
1. Pipelines are defined on your end, I want to build another option but for now it is still just queried as an API endpoint
2. Same as 1, so yes you can definitely use your models, you can definitely just send outputs you don't have to send prompts.
I'm a bit curious what you're working on, and if there might be some interesting connections there. Would you like to speak? You can just book in my calendar through the site.
Sorry for the late reply, I'm juggling family / working as a full time senior resident / final year specialty trainee in a German hospital / maintaining three side projects. I've looked at you calendar and the timezones are a huge problem: either I get up at 4 AM or book it after the late shift (11 PM here)...
Anyway:
It's an open-source licensed, distributed data orchestration framework designed from the ground up for HITL workflows where correctness matters more than speed (primary field is medicine, but law, etc. could also benefit). It sounds like we're attacking the same problem from complementary angles. You're building the human routing API, I'm building the pipeline infrastructure that defines when and how humans get routed into the loop.
The core idea: pipelines are YAML-defined state machines with explicit correction steps. When a worker (e.g. your LLM endpoint) produces output, the pipeline can pause, send results to a human reviewer, and wait for either approval or corrected data; all as first-class custom protocol messages (based on Majordomo protocol). The correction protocol has timeout handling, strike counting for repeated failures, and an audit trail that captures every decision point. Also, the YAML can define how to "steer" the pipeline in case of a correction, it can continue, store the correction, route to a specific step, fail, etc. (combinations also possible, e.g. store the correction, continue or jump to another step). A feature creep that's currently itching is implementing a largely reduced Lucid (the dataflow language) syntax set parser and transpiler into YAML pipeline definitions.
What might interest you: every message in a pipeline shares a UUID, and each correction creates an immutable record of what was changed and why. This is essentially your "structured training data" as a sort of (useful) byproduct of the architecture: you don't extract it after the fact, it's the communication protocol itself. Its intended workflow philosophy is an iterative fine tuning, I guess, with training data for
The framework uses ZeroMQ for binary messaging (sub-millisecond routing overhead) and can run from edge devices to datacenters. If it speaks TCP/IP and can run Python 3.11+, you can plug it in. Workers are pluggable, your existing model endpoints could be wrapped as the framework's workers with about 20 lines of Python, receiving tasks and returning results through the same correction-aware pipeline. All the components of the framework have lifecycle aware "hooks" so when you design your workers for example, in Python, you define them as a class and decorate their methods with @hook("async_init") or @hook("process_message") and those hooks get executed at each lifecycle event.
So in your project, instead of clients defining pipelines on their end and querying your API, the framework could provide the orchestration layer that routes between your clients' models, your human review queue, and back—with the pipeline definition living in a YAML file rather than scattered across client code. Your humans would interact with a well-defined correction protocol rather than ad-hoc intervention.
No HTTP endpoint (yet), you'd need to implement a worker that relays e.g. REST API calls and translates them into the framework's messages.
It's LGPL-licensed, intended for federated machine learning and self-hosted scenarios, and the (initial, now fairly more complex) "spartanic philosophy" means the core stays minimal while complexity lives in pluggable workers.
But it's not MVP ready, some things are still broken and I'm trying to hit the 0.1.0 version with a simple demo that takes a WAV file, transcribes it into text, then another model extracts keywords from the text, including intent and the basic context, then it all goes to another model that generates a TinkerPop/Gremlin query based on it, then the client executes the query and the results get sent along to the final worker that summarizes the (reduced) knowledge graph. That'd show a multi modal pipeline in action.
If you're interested, find me on github, the username is the same.
A message-driven orchestration framework envisioned from the ground-up for Human-in-the-Loop workflows. Think accelerated, distributed/federated machine learning where fast iterations and continuous fine tuning stand in foreground; where you want humans validating, correcting, and steering the data pipelines rather than just fire-and-forget inference, or bulk data -> bulked model training.
The architecture is deliberately minimal: ZeroMQ based broker, coordinating worker nodes through a rather spartanic protocol that extends MajorDomo. Messages carry UUIDs for correlation, sender/receiver routing, type codes for context-dependent semantics and optional (but very much used) payloads. Pipeline definitions live in YAML files (as do worker and client configs) describing multi-step workflows with conditional routing, parallel execution, and wait conditions based on worker responses. Python is the language of the logic part.
I am trying to follow the "functional core, imperative shell" philosophy where each message is essentially an immutable, auditable block in a temporal chain of state transformations. This should enable audit trails, event sourcing, and potentially no-loss crash recovery. A built-in block-chain-like verification is something I'm currently researching and could add to the whole pipeline processing.
The hook system provides composable extensibility of all main user-facing "submodules" through mixin classes, so you only add complexity for features you actually need. The main pillars of functionality, the broker, the worker and the client, as well some others, are designed to be self contained monolithic classes (often breaking the DRY principle...), whose additional functionality is composed rather than inherited through mixins that add functionality while at the same time minimizing the amount of added "state capital" (accent on behaviour rather than state management). The user-definebale @hook("process_message"), @hook("async_init"), @hook("cleanup") etc. cross-cut into the lifecycle of each submodule and allow for simple functionality extension.
I'm also implementing a very simple distributed virtual file system with unixoid command patterns (ls, cd, cp, mv etc) supporting multiple backends for storage and transfer; i.e. you can simply have your data worker store files it subscribes to in a local folder and have it use either its SSH, HTTPS or FTPS backend to serve these on demand. The data transfers employ per file operation ephemeral credentials, the broker only orchestrates metadata message flow between sender and receiver of the file(s), the transfer happens between nodes themselves. THe broker is the ultimate and only source of truth when it comes to keeping tabs on file tables, the rest sync, in part or in toto, the actual, physical files themselves. The VFS also features a rather rudimentary permission control.
So where's the ML part, you might ask? The framework treats ML models as workers that consume messages and produce outputs, making it trivial to chain preprocessing, inference, postprocessing, fine-tuning, and validation steps into declarative YAML pipelines with human checkpoints at critical decision points. Each pipeline can be client-controlled to run continuously, step-by-step, or interrupted at any point of its lifecycle. So each step or rather each message is client-verifiable, and clients can modify them and propagate the pipeline with the corrected message content; the pipelines can define "on_correction", "on_rejection", "on_abort" steps for each step along the way where the endpoints are all "service" that workers need to register. The workers provide services like "whisper_cpp_infer", "bert_foo_finetune_lora", "clean_whitespaces", "openeye_gpt5_validate_local_model_summary", etc., the broker makes sure the messages flow to the right workers, the workers make sure the messages' content is correctly processed, the client (can) make(s) sure the workers did a good job.
Sorry for the wall of text and disclaimer: I'm not a dev, I'm an MD who does a little programming as a hobby (thanks to gen-AI it's easier than ever to build software).
After the logic of capturing and double buffering the CC's output resulted in a decrease of flicker of about 97.5%, I created a FIPA ACL messaging MCP bridge for all three CLI tools and wrapped it in an IRC-like chat interface. Now all three tools can communicate with each other and this works surprisingly well if you give them roles or parts of tasks.
It's all local to the terminal interface, no remote servers, no API keys, just one wrapper for local terminal multiplexing and inter agent communication.
https://github.com/vmitro/bukowski