Built this over a weekend because every C++ project I worked on needed LLM features and the options were either "wrap the Python SDK somehow" or "write the curl SSE parsing yourself again."
26 libraries across four categories — Core, Data, Ops, App. Each one is a single .hpp file. Drop in what you need, define one implementation macro, ship. No Python runtime, no package manager, no build system changes beyond linking libcurl where needed.
llm-stream has 83 unique clones already with zero promotion. Figured the rest of the suite was ready to share.
Happy to answer questions on design decisions, the single-header pattern, or anything else.
The previous documented record for simultaneous Claude Code agents is 16, set by Anthropic's own engineering team building a C compiler.
Tonight I ran 24.
What made this possible: I built a Tokio-native LLM orchestration pipeline in Rust that routes inference through a local Mistral 7B model running on an RTX 4070. The agents coordinate through governance docs (CLAUDE.md, AGENTS.md) that define module ownership and test ratios, they don't communicate directly, they communicate through the codebase.
The recursive part: the orchestrator was built using the same multi-agent workflow it enables. The MCP server that lets Claude call the pipeline natively.
Technical details:
683 tests, 0 failures
1.53:1 test-to-production ratio enforced by pre-commit hooks
9-stage deduplication pipeline with circuit breakers
Sub-millisecond pipeline overhead (inference dominates at 3.6s on Mistral 7B)
I launched a solo financial intelligence tool 2 weeks ago at $9.99/mo. Caught 6 major market moves in the first week (including a $9.9B acquisition and a +137% move). Got users in 46+ countries organically.
Then I tripled the price to $29.99 Pro / $149.99 Ultra.
Conversion behavior didn't change. If anything, traffic to the pricing page increased.
I think the problem was that $9.99 signaled "toy" to options traders who routinely risk thousands per trade. The higher price actually built trust.
Anyone else found that raising prices early improved conversion? Curious if this is specific to finance/trading products or a broader SaaS pattern.
Product: rot.trading (165K lines of Python, built solo in 9 days, $5/mo infrastructure)
Week 1 caught $14.1B in acquisitions, MASI +34%, ZIM +25%, OLB +137%, CMPS +39%, all with UTC timestamps before the moves.
165K lines of Rust, custom NLP engine (no spaCy/NLTK), GradientBoosting over 32 features, IV-aware options strategy selection, and a closed-loop suppression gate that improves win rate between retrains. Also the first financial intelligence MCP server, integrates directly into Claude as a native tool.
Full technical spec in the repo.
ROT is a financial intelligence platform I built solo in Python. It reads Reddit and 15+ sources in real-time, scores signals with AI, and outputs structured trade ideas.
Week one results:
- $9.9B Masimo acquisition — caught from a Sunday night Reddit post. +34%
- $4.2B ZIM buyout — flagged over a holiday weekend. +25%
- PayPal partnership — detected same day. +137%
- Biotech catalyst — picked up 3 days early from WSB. +39%
235% combined moves. All public information. The system just reads faster than you.
"If your data is so good why aren't you trading on it" is a question you could ask Bloomberg, RavenPack, Dataminr, or any financial data company that's ever existed. Bloomberg makes $12B/year selling the terminal, not trading on it.
Selling the pickaxes is a better business than mining.
I’m truly astounded that you were able to predict with such confidence these moves that surely you’ve made lots of money off it in addition to selling a $5 subscription to a few people.
Oh I thought you were being sarcastic lol!! Im sorry lol, I'm still early on this its only been out for a few weeks, so the money hasn't quite hit yet.
I built a CLI tool in Rust that intercepts OpenAI’s streaming API and transforms every other token in real-time. You can reverse, uppercase, mock, or add noise, all live, as the model streams.
Why?
> Most LLM work assumes prompt full response.
> But what happens when you break the stream mid-flight?
This tool lets you:
- Intervene at the token level while the model responds
- Study how LLMs degrade semantically with corrupted output
- Do real-time interpretability research (token dependency, causal flow)
- Play with creative transformations in generative workflows
Tech:
- Written in Rust
- Streams directly from OpenAI’s chat API
- Fully async, low-latency, ~10k+ tokens/sec
- Works with any OpenAI model (e.g. GPT-3.5, GPT-4)
I’ve built a high-performance, open-source C++ simulation engine designed to ingest streams of tokens (e.g., from large language models) and convert them into live trade signals on sub-10 microsecond latency.
The engine performs semantic mapping of tokens like “crash,” “bullish,” or “panic” into weighted market bias and volatility signals — which then dynamically adjust trading strategy parameters at fractional time scales.
It supports lock-free concurrency, zero-copy streaming, configurable sensitivities, and detailed logging for research and experimentation.
I made it as a pet project exploring the fusion of NLP and quantitative trading at low latency.
Would love to get feedback from the HN community, especially those with expertise in quant finance, low-latency systems, or AI-driven market strategies.
Here’s the repo and a sample token stream for quick testing.
reply