More

tipsytoad · 2026-01-05T08:21:54 1767601314

Someone not familiar with the field rediscovering the stochastic parrot argument from 3+ years ago

tipsytoad · 2025-12-19T22:03:52 1766181832

Clearly not from the UK. By US standards Labour would be socialist, and conservative (right) liberal at best.

tipsytoad · 2025-12-06T17:55:27 1765043727

It’s a quite deceptive paper. The main headline benchmarks (math500, aime24 /25) final answer is just a number from 0-1000, so what is the takeaway supposed to be for pass@k of 512/1024?

On the unstructured outputs, where you can’t just ratchet up the pass@k until it’s almost random, it switches the base model out for instruct, and in the worse case on livecodebench it uses a qwen-r1-distill as a _base_ model(!?) that’s an instruct model further fine tuned on R1’s reasoning traces. I assume that was because no matter how high the pass@k, a base model won’t output correct python.

tipsytoad · 2025-04-06T10:20:24 1743934824

I get the same feeling that I'm "not being productive" while playing video games, watching tv, etc that seems to kill any enjoyment from doing these things.

For me learning piano has been a great alternative to programming in the off hours (typing is quite transferrable too!). Highly recommend if you're like me on screens all day.

tipsytoad · 2025-03-22T14:17:19 1742653039

Like, PyTorch? And the new Mac minis have 512gb of unified memory

tipsytoad · 2025-03-10T16:38:24 1741624704

I usually am a huge fan of “copilot” tools (I use cursor, etc) and Claude has always been my go to.

But Sonnet 3.7 actually seems dangerous to me, it seems it’s been RL’d _way_ too hard into producing code that won’t crash — to the point where it will go completely against the instructions to sneak in workarounds (e.g. returning random data when a function fails!). Claude Code just makes this even worse by giving very little oversight when it makes these “errors”

curiouser3 · 2025-03-10T19:03:33 1741633413

this is a huge issue for me as well. It just kind of obfuscates errors and masks the original intent, rather than diagnosing and fixing the issue. 3.5 seemed to be more clear about what it's doing and when things broke at least it didn't seem to be trying to hide anything.

tipsytoad · on Oct 30, 2024

I wholly disagree with the comic, but a anti AI art take I’m more sympathetic to: https://x.com/soi/status/1815584824033177606?s=46

tipsytoad · on Oct 21, 2024

I don’t think this hits at the heart of the issue? Even if we can catch AI text with 100% accuracy, any halfway decent student can rewrite it from scratch using o1s ideas in lieu of actually learning.

This is waay more common and just impossible to catch. The only students caught here are those that put no effort in at all

lcnPylGDnU4H9OF · on Oct 21, 2024

> rewrite it from scratch ... in lieu of actual learning

If one can "rewrite it from scratch" in a way that's actually coherent and gets facts correct, then they learned the material and can write an original paper.

> This is waay more common and just impossible to catch.

It seems a good thing that this is more common and, naturally, it would -- perhaps should, given the topic -- be impossible to catch someone cheating when they're not cheating.

tipsytoad · on Aug 18, 2024

Just another +1 that if you’re going to give vscode a fair shot, it’s much better to go with vscode-neovim than the standard vim extension. You can even map most of your config right over.

E.g. (mine) https://github.com/tom-pollak/dotfiles/tree/master/nvim

tipsytoad · on Aug 7, 2024

How is this different from instructor? github.com/jxnl/instructor

namely, why did they take so long for something that just seems like a wrapper around function calling?