What's interesting to me isn't the self-generated finding (everyone here has correctly identified the methodology issue). It's Table 4 buried on page 6.
The domains where models have the weakest priors from pretraining benefit the most from external procedural knowledge. That's not surprising on its own, but there's an implication I haven't seen anyone raise: these are exactly the enterprise domains where that procedural knowledge is most proprietary and most dangerous to lose between sessions.
The paper's entire architecture is single-player. A SKILL.md sits in a directory, one agent reads it, session ends. When Agent A at a bank figures out the right approach to parsing 13F filings (0% to 75% with the right skill in this paper), that knowledge dies with the context window. Agent B starts from scratch.
We're building shared memory infrastructure for agents at Memco (https://memco.ai) and this paper maps directly to what our enterprise design partners keep telling us — the problem isn't writing skills, it's that procedural knowledge doesn't compound across agents, sessions, or teams. The paper even shows 2-3 focused skills outperform comprehensive docs, which is a retrieval problem masquerading as an authoring problem.
The question this paper should be asking isn't "can agents write their own skills" — it's "what infrastructure makes skills accumulate and transfer?" Static files in a folder is the wrong primitive for that.
The thread keeps circling back to memory. Agents don't learn.
Everyone's building the same workarounds. CLAUDE.md files. Handoff docs. Learnings folders. Developer logs. All manual. All single-user. All solving the same problem: how do I stop re-teaching the agent things it should already know?
What nobody seems to ask: what if the insight that helped me debug a PayPal API timeout yesterday could help every developer who hits that bug tomorrow?
Stack Overflow was multiplayer. A million developers contributing solutions that benefited everyone. We replaced it with a billion isolated sessions that benefit no one else.
The "junior developer that never grows" framing is right. But it's worse - it's a junior who forgets everything at 5pm and shows up tomorrow needing the same onboarding. And there's no way for your junior's hard-won knowledge to help anyone else's.
We're building Memco to work on this. Shared memory layer for agents. Not stored transcripts - abstracted insights. When one agent figures something out, every agent benefits.
Still early. Curious if others are thinking about this or have seen attempts at it.
> "Stack Overflow was multiplayer. A million developers contributing solutions that benefited everyone. We replaced it with a billion isolated sessions that benefit no one else".
Memco.ai | Commercial Lead | REMOTE | Full-time | memco.ai
Memco is building shared memory infrastructure for AI agents - the protocol that lets agents learn from each other's discoveries and build collective intelligence. We capture successful workflows, assimilate patterns, and enable instant retrieval so agents never solve the same problem repeatedly.
We're backed by Moonsong Labs (the team behind Moonbeam, Tanssi, Kluster.ai) and have design partnerships with world leading platforms and brands.
We're hiring our first senior commercial leader to take us from zero to one. This is hands-on, high-impact: own everything from customer discovery to landing flagship pilots with API-first platforms and converting them to paid ARR.
What you'll do:
- Land 2-3 flagship pilots with well-known platforms and convert to paid ARR
- Define ICP, build repeatable sales playbook, shape messaging
- Demo credibly to technical buyers without relying on a sales engineer
- Run structured enterprise pilots with clear success criteria
- Translate field insights into product signals for engineering
What you bring:
- 6-12+ years B2B SaaS selling to developer-platforms or API-first companies
- Track record landing net-new logos in early-stage/greenfield GTM
- Deep familiarity with DevRel, developer ecosystems, API infrastructure
- Comfort balancing IC selling with light product marketing
Shared memory for AI coding agents. Every agent currently reinvents the wheel - your agent debugs a Stripe webhook signature issue that mine solved yesterday, burns tokens doing it.
Building the retrieval and memory maintenance layer. Interesting problems around decomposing solutions into reusable patterns, ranking/deduping at scale, keeping latency under 100ms. Uses MCP so it works across IDEs.
Early benchmarks look promising. https://memco.ai if you want to try it.
Entourage | REMOTE | Principal AI Engineer – Collective‑Memory / Multi‑Agent Systems | Full‑time + meaningful equity
We’re building the shared‑memory protocol that lets AI agents learn from each other, turning isolated tools into a network of collective intelligence. Incubated by Moonsong Labs (creators of Kluster.ai, Moonbeam, Tanssi).
What you’ll do
- Design and ship distributed infrastructure that captures, validates, and surfaces agent experiences in real time.
- Operationalise SOTA research in LLMs, RL, and multi‑agent coordination into fault‑tolerant, low‑latency services.
- Turn episodic trajectories into reusable knowledge for a growing ecosystem of mutually distrusting agents.
- Set technical standards, mentor engineers, and grow the AI platform team.
Tech you’ll touch
Python, Rust/Go, LangGraph / AutoGen / CrewAI, vector & graph stores, Temporal / Kafka, k8s, Terraform, modern GPU/cloud stacks. Web3, consensus, or token‑incentive know‑how is a nice‑to‑have.
You bring
- A record of shipping complex AI or distributed‑systems platforms (startup or research labs welcome).
- Deep appreciation of LLM post‑training, multi‑agent frameworks, and end‑to‑end MLOps.
- MSc or (ideally) PhD in a STEM field; 5 + yrs hands‑on engineering.
- Desire to lead and still stay close to code.
Why Entourage
- Green‑field architecture, zero legacy.
- Direct impact on the connective tissue of the coming agent economy.
- Competitive salary, generous equity, hardware budget, London hub with quarterly off‑sites.
- Inclusive culture – we hire for talent and potential, not pedigree.
We're building the protocol layer for collective AI agent intelligence. Our core insight is enabling mutually distrusting agents to exchange and validate experiences via a shared memory layer, transforming isolated learning into compounded network intelligence (think Git/OSS for agent experiences). This is crucial infrastructure for the future agent economy. Incubated by Moonsong Labs (Kluster.ai, Moonbeam, Tanssi).
Seeking a hands-on, visionary CTO & Co-founder to architect our system (AI, distributed systems, data, infra), build the founding engineering team, and drive technical strategy. This involves deep work in multi-agent systems, collective learning, secure execution, and potentially novel consensus/incentive mechanisms.
Must have: Proven leadership in AI infra (CTO/VP/Founder/Lead Scientist). Expertise in GenAI/RL/DL/MLOps/PyTorch. Experience with agent frameworks (LangGraph, AutoGen, etc.) and distributed/federated AI concepts. MSc required, PhD strongly preferred. Strategic thinker passionate about building developer ecosystems.
Nice to have: Blockchain/Web3/tokenomics experience.
We're building the protocol layer for collective AI agent intelligence. Our core insight is enabling mutually distrusting agents to exchange and validate experiences via a shared memory layer, transforming isolated learning into compounded network intelligence (think Git/OSS for agent experiences). This is crucial infrastructure for the future agent economy. Incubated by Moonsong Labs (Kluster.ai, Moonbeam, Tanssi).
Seeking a hands-on, visionary CTO & Co-founder to architect our system (AI, distributed systems, data, infra), build the founding engineering team, and drive technical strategy. This involves deep work in multi-agent systems, collective learning, secure execution, and potentially novel consensus/incentive mechanisms.
Must have: Proven leadership in AI infra (CTO/VP/Founder/Lead Scientist). Expertise in GenAI/RL/DL/MLOps/PyTorch. Experience with agent frameworks (LangGraph, AutoGen, etc.) and distributed/federated AI concepts. MSc required, PhD strongly preferred. Strategic thinker passionate about building developer ecosystems.
Nice to have: Blockchain/Web3/tokenomics experience.
Healthcare +51.9pp. Manufacturing +41.9pp. Software Engineering +4.5pp.
The domains where models have the weakest priors from pretraining benefit the most from external procedural knowledge. That's not surprising on its own, but there's an implication I haven't seen anyone raise: these are exactly the enterprise domains where that procedural knowledge is most proprietary and most dangerous to lose between sessions.
The paper's entire architecture is single-player. A SKILL.md sits in a directory, one agent reads it, session ends. When Agent A at a bank figures out the right approach to parsing 13F filings (0% to 75% with the right skill in this paper), that knowledge dies with the context window. Agent B starts from scratch.
We're building shared memory infrastructure for agents at Memco (https://memco.ai) and this paper maps directly to what our enterprise design partners keep telling us — the problem isn't writing skills, it's that procedural knowledge doesn't compound across agents, sessions, or teams. The paper even shows 2-3 focused skills outperform comprehensive docs, which is a retrieval problem masquerading as an authoring problem.
The question this paper should be asking isn't "can agents write their own skills" — it's "what infrastructure makes skills accumulate and transfer?" Static files in a folder is the wrong primitive for that.
reply