What was the motivation? Honestly, I was too lazy to get a job and staying in academia for another 3+ years seemed amazing (probably not recommended, but it worked out OK for me).
What helped get me through it:
1) Doing something I genuinely enjoyed - I approached the Computer Vision professor who gave me some ideas. I super enjoy writing code, and the idea of processing gigabytes of video to produce answers seemed cool. I treated it as a super difficult programming project.
2) Breaking my leg - Just before starting, I broke my leg badly. And that meant working from home with a weekly visit from the professor with a stack of reading papers. That time spent understanding state of the art was super useful.
3) Funding - At some point, DARPA gave enough money for me not to worry about funding, so I never had to work a job or get distracted.
4) Marriage - The final straight of writing a thesis was tough and I was super lucky to have a supportive wife who pushed me to get-shit-done.
As if "A" or "C" defined a person capacity. I know some straight A's that went directly for a repetitive and boring but well paid and stable job. Other stayed in academia and turned top scientists.
Academia is a very particular dynamic very difficult to find elsewhere, and some people dig it. You can watch some people finding the same dynamic at Google for example, where they are allowed and encouraged to fiddle around and keep publishing (e.g. the Attention paper, why would Google allow such publication?). Such dynamics are explored in Terence Kealy book "The economic laws of scientific research".
This varies widely between fields and institutions. Getting a PhD position nowadays in ML or computer vision is much harder. You need to already have publications when you apply and need to have experience specifically in the subfield, give a good talk, an interview, a good motivation letter / research statement, recommendation letters from good internships and multiple PIs you worked with, good grades, etc.
It can be different in other fields an in lower tier colleges.
I remember attending a tech event at MSR Cambridge, and a speaker made some disparaging comment about older developers not being able to keep up in this modern world of programming.
An older gentleman stood up and politely mentioned they knew a thing or two.
The $200/month plan doesn't have limits either - they have an overage fee you can pay now in Claude Code so once you've expended your rate limited token allowance you can keep on working and pay for the extra tokens out of an additional cash reserve you've set up.
> The $200/month plan doesn't have limits either... once you've expended your rate limited token allowance... pay for the extra tokens out of an additional cash reserve you've set up
You're absolutely right! Limited token allowance for $200/month is actually unlimited tokens when paying for extra from a cash reserve which is also unlimited, of course.
I think you may have misunderstood something here.
When paying for Claude Max even at $200/month there are limits - you have a limit to the number of tokens you can use per five hour period, and if you run out of that you may have to wait an hour for the reset.
You COULD instead use an API key and avoid that limit and reset, but that would end up costing you significantly more since the $200/month plan represents such a big discount on API costs.
As-of a few weeks ago there's a third option: pay for the $200/month plan but allow it to charge you extra for tokens when you reach those limits. That gives you the discount but means your work isn't interrupted.
Thank you for the explanation, but I did fully understand that is what you were saying.
What I don't fully understand is how you can characterize that as "not limited" with a straight face; then again, I can't see your face so maybe you weren't straight faced as you wrote it in the first place.
Hopefully you could see my well meaning smile with the "absolutely right" opening, but apparently that's no longer common so I can understand your confusion as https://absolutelyright.lol/ indicates Opus 4.5 has had it RLHF'd away.
When I said "not limited" I meant "no longer limits your usage with a hard stop when you run out of tokens for a five hour period any more like it did until a few weeks ago".
That's why I said "not limited" as opposed to "unlimited" - a subtle difference in word choice, I'll give you that.
It is possible to understand the mechanism once you drop the anthropomorphisms.
Each token output by an LLM involves one pass through the next-word predictor neural network. Each pass is a fixed amount of computation. Complexity theory hints to us that the problems which are "hard" for an LLM will need more compute than the ones which are "easy". Thus, the only mechanism through which an LLM can compute more and solve its "hard" problems is by outputting more tokens.
You incentivise it to this end by human-grading its outputs ("RLHF") to prefer those where it spends time calculating before "locking in" to the answer. For example, you would prefer the output
Ok let's begin... statement1 => statement2 ... Thus, the answer is 5
over
The answer is 5. This is because....
since in the first one, it has spent more compute before giving the answer. You don't in any way attempt to steer the extra computation in any particular direction. Instead, you simply reinforce preferred answers and hope that somewhere in that extra computation lies some useful computation.
It turned out that such hope was well-placed. The DeepSeek R1-Zero training experiment showed us that if you apply this really generic form of learning (reinforcement learning) without _any_ examples, the model automatically starts outputting more and more tokens i.e "computing more". DeepseekMath was also a model trained directly with RL. Notably, the only signal given was whether the answer was right or not. No attention was paid to anything else. We even ignore the position of the answer in the sequence that we cared about before. This meant that it was possible to automatically grade the LLM without a human in the loop (since you're just checking answer == expected_answer). This is also why math problems were used.
All this is to say, we get the most insight on what benefit "reasoning" adds by examining what happened when we applied it without training the model on any examples. Deepseek R1 actually uses a few examples and then does the RL process on top of that, so we won't look at that.
Reading the DeepseekMath paper[1], we see that the authors posit the following:
As shown in Figure 7, RL enhances Maj@K’s performance but not Pass@K. These
findings indicate that RL enhances the model’s overall performance by rendering
the output distribution more robust, in other words, it seems that the
improvement is attributed to boosting the correct response from TopK rather
than the enhancement of fundamental capabilities.
For context, Maj@K means that you mark the output of the LLM as correct only if the majority of the many outputs you sample are correct. Pass@K means that you mark it as correct even if just one of them is correct.
So to answer your question, if you add an RL-based reasoning process to the model, it will improve simply because it will do more computation, of which a so-far-only-empirically-measured portion helps get more accurate answers on math problems. But outside that, it's purely subjective. If you ask me, I prefer claude sonnet for all coding/swe tasks over any reasoning LLM.
If those functions represent (by some odd coincidence) half of your code-base each (half pure, half impure). Then you still benefit from the pure functional programming half.
You can always start small and build up something that becomes progressively more stable: no code base is too imperative to benefit from some pure code. Every block of pure code, even if surrounded by impure code, is one block you don't have to worry so much about. Is it fundamentalist programming? Of course not. But slowly building out from there pays you back each time you expand the scope of the pure code.
You won't have solved all of the worlds ills, but you've made part of the world's ills better. Any pure function in an impure code-base is, by-definition: more robust, easier to compose, cacheable, parallelisable, etc. these are real benefits, doesn't matter how small you start.
So, the more fundamentalist position of "once one part of your code is impure, it all is" doesn't say anything useful. And I'm always surprised when Erik pulls that argument out, because he's usually extremely pragmatic.
Interestingly they used to attach a sponge to the end. You might think that was because it doesn’t break the glass, but really it was to ensure the nearby houses don’t get woken up for free!
"I’m not ready to argue against Brooks’ Law that adding people to a late project makes it later. But today, when developers are working on a clean codebase, I see lots of work happening in parallel with tool support to facilitate coordination. When things are going smoothly, it’s because the architecture is largely set, the design patterns provide guidance for most issues that arise, and the code itself (with README files alongside) allow developers to answer their own questions."
What was the motivation? Honestly, I was too lazy to get a job and staying in academia for another 3+ years seemed amazing (probably not recommended, but it worked out OK for me).
What helped get me through it:
1) Doing something I genuinely enjoyed - I approached the Computer Vision professor who gave me some ideas. I super enjoy writing code, and the idea of processing gigabytes of video to produce answers seemed cool. I treated it as a super difficult programming project.
2) Breaking my leg - Just before starting, I broke my leg badly. And that meant working from home with a weekly visit from the professor with a stack of reading papers. That time spent understanding state of the art was super useful.
3) Funding - At some point, DARPA gave enough money for me not to worry about funding, so I never had to work a job or get distracted.
4) Marriage - The final straight of writing a thesis was tough and I was super lucky to have a supportive wife who pushed me to get-shit-done.
reply