More

btown · 2025-12-26T21:50:40 1766785840

One approach if "dang it, someone/I needed to use kubectl during the outage, how do we get gitops/poor-mans-gitops back in place to match reality" is, either agentically-looping or artisanally-looping, to try simple gitops configurations (or diffs to current gitops configurations) until a dry-run diff with your live configuration results in no changes.

For instance, with Helm, I've had success using Helmfile's diffs (which in turn use https://github.com/databus23/helm-diff) to do this.

There's more of a spectrum between these than you think, in a way that can be agile for small teams without dedicated investment in gitops. Even with the messes that can occur, I'd take it over the Heroku CLI any day.

btown · 2025-12-22T16:51:19 1766422279

https://archive.is/jUJ2E

btown · 2025-12-22T16:33:52 1766421232

With great love to your comment, this has the same vibes as the infamous 2007 Dropbox comment: https://news.ycombinator.com/item?id=9224

I'd also argue that the context for an agent message is not the commit/release for the codebase on which it was run, but often a commit/release that is yet to be set up. So there's a bit of apples-to-oranges in terms of release tagging for the log/trace.

It's a really interesting problem to solve, because you could in theory try to retroactively find which LLM session, potentially from days prior, matches a commit that just hit a central repository. You could automatically connect the LLM session to the PR that incorporated the resulting code.

Though, might this discourage developers from openly iterating with their LLM agent, if there's a panopticon around their whole back-and-forth with the agent?

Someone can, and should, create a plug-and-play system here with the right permission model that empowers everyone, including the Programmer-Archaeologists (to borrow shamelessly from Vernor Vinge) who are brought in to "un-vibe the vibe code" and benefit from understanding the context and evolution.

But I don't think that "just dump it in clickhouse" is a viable solution for most folks out there, even if they have the infrastructure and experience with OTel stacks.

CuriouslyC · 2025-12-22T16:55:14 1766422514

I get where you're coming from, having wrestled with Codex/CC to get it to actually emit everything needed to even do proper evals.

From a "correct solution" standpoint having one source of truth for evals, agent memory, prompt history, etc is the right path. We already have the infra to do it well, we just need to smooth out the path. The thing that bugs me is people inventing half solutions that seem rooted in ignorance or the desire to "capture" users, and seeing those solutions get traction/mindshare.

btown · 2025-12-20T15:50:07 1766245807

Something that’s under-emphasized and vital to understand about Skills is that, by the spec, there’s no RAG on the content of Skill code or markdown - the names and descriptions in every skill’s front-matter are included verbatim in your prompt, and that’s all that’s used to choose a skill.

So if you have subtle logic in a Skill that’s not mentioned in a description, or you use the skill body to describe use-cases not obvious from the front-matter, it may never be discovered or used.

Additionally, skill descriptions are all essentially prompt injections, whether relevant/vector-adjacent to your current task or not; if they nudge towards a certain tone, that may apply to your general experience with the LLM. And, of course, they add to your input tokens on every agentic turn. (This feature was proudly brought to you by Big Token.) So be thoughtful about what you load in what context.

See e.g. https://github.com/openai/codex/blob/a6974087e5c04fc711af68f...

erichocean · 2025-12-20T16:27:53 1766248073

Some agentic systems do apply RAG to skills, there's nothing about skills that requires blind insertion into prompts.

This is really an agentic harness issue, not an LLM issue per se.

In 2026, I think we'll see agentic harnesses much more tightly integrated with their respective LLMs. You're already starting to see this, e.g. with Google's "Interactions" API and how different LLMs expect CoT to be maintained.

There's a lot of alpha in co-optimizing your agentic harness with how the LLM is RL-trained on tool use and reasoning traces.

Sammi · 2025-12-20T20:37:12 1766263032

Honestly the index seems as much a liability as a boon. Keeping the context clean and focused is one of the most important things for getting the best out of lmms. For now I prefer just adding my md files to the context whenever I deem them relevant.

Skills are much simpler than mcps, which are hopelessly overengineered, but even skills seem unnecessarily overengineered. You could fix the skill index taking up place in the context, by just making it a tool available to the agent (but not an mcp!).

jimmydoe · 2025-12-20T16:22:00 1766247720

but that's same for MCP and tools, no?

mkagenius · 2025-12-20T17:18:45 1766251125

Yes. Infact you can serve each Skill as a tool exposed via MCP if you want. I did the same to make Skills work with Gemini CLI (or any other tool that supports MCP) while creating open-skills.

1. Open-Skills: https://github.com/BandarLabs/open-skills

brumar · 2025-12-20T20:22:44 1766262164

Interesting. Skills on MCP makes a lot of sense in some contexts.

wincy · 2025-12-20T18:22:58 1766254978

A consultant started recommending the Azure devops MCP and my context window would start around 25% full. It’s really easy to accidentally explode your token usage and destroy your context windows. Before I’d use az cli calls as needed and tell the agent to use the same, which used significantly less context and was more targeted.

btown · 2025-12-19T18:26:28 1766168788

If the drives continue to have power, but the OS has crashed, will the drives persist the data once a certain amount of time has passed? Are datacenters set up to take advantage of this?

Nextgrid · 2025-12-19T18:39:01 1766169541

> will the drives persist the data once a certain amount of time has passed

Yes, otherwise those drives wouldn't work at all and would have a 100% warranty return rate. The reason they get away with it is that the misbehavior is only a problem in a specific edge-case (forgetting data written shortly before a power loss).

unsnap_biceps · 2025-12-19T18:39:56 1766169596

Yes, the drives are unaware of the OS state.

btown · 2025-12-19T18:22:33 1766168553

With gravitational lensing, this is actually viable! Just send a signal at a gravity sink, and travel at sublight speeds to position yourself in a place where it will be redirected to eventually along a longer path, and you can intercept your own signal! You just have to be really, really lucky.

btown · 2025-12-16T22:02:34 1765922554

Would you mind sharing what hardware/card(s) you're using? And is https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B... one of the ones you've tested?

wcallahan · 2025-12-19T07:56:45 1766131005

Yes, I run it locally on 3 different AMD Strix Halo machines (Framework Desktop and 2 GMKTec machines, 128gb x 2, 96gb x 1) and a Mac Studio M2 Ultra 128gb of unified memory.

I’ve used several runtimes, including vLLM. Works great! Speedy. Best results with Ubuntu after trying a few different distributions and Vulkan and ROCm drivers.

heavyset_go · 2025-12-17T07:57:10 1765958230

Support for this landed in llama.cpp recently if anyone is interested in running it locally.

btown · 2025-12-16T21:28:45 1765920525

I can say, from a business perspective, I've needed to use similar methodologies, though far from needing air-gap requirements and relying heavily on web search, to evaluate potentially fraudulent transactions and relationships between parties.

What are the competing hypotheses, other than fraud, when a person makes a massive luxury purchase, but with red-flag-adjacent inconsistencies in other information provided? If we need to identify whether there's a weird or competitive ownership relationship behind a potential opportunity, how do we determine if an initial hypothesis about relationships is correct?

If ArkhamMirror has an online mode with web search as a tool call, I'd be curious to try it out to automate some of these ACH-adjacent workflows.

ArkhamMirror · 2025-12-18T00:54:24 1766019264

It doesn't have an online mode yet - although there's a lot of stuff in the works. However, since docker and LM Studio are already included in the setup, you can turn on MCP Toolkit in Docker and add the Docker MCP to LM Studio. With Docker Toolkit on, you get access to over 300 different MCPs for your local LLM including web search via DuckDuckGo or Brave Search, automation tools like n8n, web manipulation stuff with playwright, and all sorts of potentially useful stuff. (not a sponsor :P) Then your "local" LLM can suddenly do all sorts of agential stuff. This isn't out-of-the box capability, since I'm only building offline, local, privacy-focused features at the moment, but turning it on isn't a huge undertaking. If you are up for messing with some prompts in the files, you could even specify to the LLM what tool you want it to use for which task if it's not automatically using them when the need arises.

btown · 2025-12-16T18:03:34 1765908214

> call Brent so he can fix it again

Not sure if a Phoenix Project reference, but if it is, it's certainly in keeping with Github being as fragile as the company in the book!

kjuulh · 2025-12-16T18:05:23 1765908323

It is xD On the outside it feels like a product held together with duct tape, wood glue and prayers.

cweagans · 2025-12-16T18:21:42 1765909302

Hey, don't insult wood glue like that.

chickensong · 2025-12-16T20:13:08 1765915988

Indeed, wood glue is amazing. Such slander is totally uncalled for.

steve_adams_86 · 2025-12-16T23:07:04 1765926424

I don't know, maybe it's a compliment. Wood glue can form bonds stronger than the material it's bonding. So, the wood glue in this case is better than the service it's holding together :)

bdangubic · 2025-12-16T23:08:04 1765926484

or prayers

btown · 2025-12-15T21:07:56 1765832876

> an American company with American employees

While technically true, these articles give context about the level of decision-making control and data access from ByteDance, as of the time of their publication.

https://restofworld.org/2024/tiktok-chinese-us-ban/ (2024)

https://www.buzzfeednews.com/article/emilybakerwhite/tiktok-... (2022)