Key highlights from the blog post (https://www.kimi.com/blog/kimi-k2-5.html):
- Agent Swarm: K2.5 can spawn up to 100 sub-agents autonomously, executing 1,500+ parallel tool calls with 4.5x speedup over single-agent.
- Video-to-code: Generate frontend code directly from a screen recording, with autonomous visual debugging (it "looks" at its own output and iterates).
- Open-source model weights on HuggingFace, plus a CLI tool (Kimi Code) for VSCode/Cursor/Zed.
Curious to see benchmarks against Claude Opus or GPT-5.2 on real-world agentic tasks.
The R1C1 normalization is smart. Treating 5k copied formulas as one "finding" is the only way to avoid alert fatigue.
Re: magic numbers, have you considered checking column headers as a signal? E.g., if a header contains "Rate" or "Months", a hardcoded number is likely a valid constant. If it's just "Total", * 1.2 is probably a hidden risk. How do you handle cases where the context is ambiguous?
Great question! I am using column headers as context signals. If a column is named 'Rate', 'Price', 'Percentage', or 'Count', I'm more lenient with constants in formulas referencing it.
For ambiguous cases like 'Total', I currently flag it and let the user decide—which isn't ideal. I've been considering a confidence score system where:
High confidence whitelist: 24, 60, 7, 365 (time conversions)
Context-dependent: numbers near column headers with semantic meaning
Always flag: arbitrary numbers like 1.2, 847, etc. unless they're in a 'Constants' or 'Assumptions' section
The hardest edge case is something like Revenue * 0.15 where 0.15 might be a legitimate tax rate OR a hardcoded assumption that should be in a named cell. Right now I flag it as medium priority.
How would you approach this?
Hi HN,
I built this tool to solve a specific frustration with current motion control models (like Kling): they are extremely picky about input images. If the subject isn't perfectly framed or standing in a standard T/A-pose, the generation usually fails. This makes it particularly hard to create videos for pets or casual selfies.
I built a pipeline combining Kling Motion Control and NanoBanana Pro to bridge this gap.
The core logic acts as a middleware that:
1. Automatically out-paints and expands cropped images (e.g., selfies) to fit the required aspect ratio.
2. "Rigs" difficult subjects (especially cats/dogs) into a structure that the motion model can interpret, effectively mapping human dance logic onto non-human agents.
3. Wraps this in a template system so users don't need complex prompting.
The goal was to make the input robust enough that you can throw almost any "imperfect" photo at it and get a coherent dance video.
It's live at https://aibabydance.com – would love any feedback on how it handles your edge-case photos!
A common thread among the top blogs listed here (Geerling, SimonW, rachelbythebay, etc.) is a distinct lack of "growth hacking" or AI-generated filler.
In an era where search results are flooded with SEO-optimized slop, these blogs have become trusted nodes primarily because they verify their own reality. Whether it's Jeff physically plugging in a PCIe card or Rachel debugging a weird server issue, the value proposition is "I actually did this thing, and here is what happened."
It seems the best SEO strategy for 2025 is simply proving you are a human doing actual work.
The "verify their own reality" point resonates. I stumbled on a post recently where someone documented getting an OCR model from 90% to 98% accuracy - turns out most of the gain came from discovering their training labels were 27% wrong, not from model tweaks. The interesting bit was their finding that running AI verification in parallel resulted in 2% correction rate, but sequential processing caught 65%. That kind of hard-won, numbers-backed insight is what makes technical blogs worth reading vs the flood of tutorial content.
This workflow (extracting -> transcribing -> curating) is increasingly vital.
We are seeing a massive amount of domain knowledge being locked inside "un-indexable" video containers or walled gardens like Discord and TikTok. Ten years from now, a search query won't find that brilliant explanation on a niche topic unless someone like you pulled it out and put it on the open web.
It's effectively acting as a bridge between the ephemeral algorithmic feed and the permanent archival web.
This really highlights the misalignment between information density and monetization mechanisms.
Text is random-access, searchable, and respects the reader's time (I can skim a blog post in 2 minutes to find the one command I need). Video is linear and demands a fixed time commitment.
It is somewhat tragic that the format which is often technically superior for documentation and reference (text) relies on the format that is optimized for engagement/retention (video) to subsidize it. Kudos to you for maintaining the blog-first workflow despite the incentives pulling the other way.
> Video is linear and demands a fixed time commitment.
Because people like video. I'd rather watch a video where the narrator shows me exactly what's happening and where, over text that I have to read. Many on HN like the opposite but don't seem to have the charity to understand the point of view of people like me.
I think this can be effective if videos are structured properly. The other day I was trying to learn Wan animate and found a comfy ui workflow that game with an hour-long "how-to use" guide. The workflow required some diffusion models that were not listed in the video description, so I had to scrub through the video to find it. I used the auto generated transcription to help me, but even that's kinda shoddy sometimes.
The official ComfyUI tutorials are great — they give you the workflow, they tell you what to download, and they have screenshots of each step of the process, and take maybe 15 mins to follow.
So I think it depends. I don't know why HN is hostile against people who prefer video, it seems like a strange hill to die on, but as with most things in life, there's nuance.
Do I know what I'm looking for? Do I know what I know and what I don't know about this subject? If yes, I prefer text so I can jump to whichever part I need. If not, I prefer a video walkthrough where I might learn about pitfalls, what to do and not to do. I'm open to sitting through a video if I'm learning something new.
It’s because a video can be passively watched when doing chores while reading text is an active … activity. The former requires less energy and commitment than the latter.
It also means that if YouTube displays an ad while I’m washing the dishes, I’m not stopping to press the skip button (unless it’s one of those silly ads that last an hour) which probably inflates the stats quite a bit.
I'm similar and I think it comes down to the exploration versus exploitation dilemma [1].
When I'm in exploration mode, time is plentiful. This makes linear mediums like videos excellent primary sources of information.
When I'm in exploitation mode, time is short making videos a bad fit for the time I have to spend. I'd rather prefer text-based primary sources that will allow non-linear consumption.
I absolutely use it for this because lots of videos drone on and pad their length for the ad revenue, so I ask the AI to summarize and then I can click through exactly what I want to see. I found myself actually watching more videos now, not less, because I know I don't have to listen to them for 10 minutes and waste my time when they don't get to the point.
Oh I agree, I use it for the same reason. It just seems counterintuitive to YouTubes bottom line (ie discouraging folks like us from watching to video to give them ad money)