More

jefffoster · 2026-03-14T11:10:18 1773486618

I did a PhD in the late 90s.

What was the motivation? Honestly, I was too lazy to get a job and staying in academia for another 3+ years seemed amazing (probably not recommended, but it worked out OK for me).

What helped get me through it:

1) Doing something I genuinely enjoyed - I approached the Computer Vision professor who gave me some ideas. I super enjoy writing code, and the idea of processing gigabytes of video to produce answers seemed cool. I treated it as a super difficult programming project.

2) Breaking my leg - Just before starting, I broke my leg badly. And that meant working from home with a weekly visit from the professor with a stack of reading papers. That time spent understanding state of the art was super useful.

3) Funding - At some point, DARPA gave enough money for me not to worry about funding, so I never had to work a job or get distracted.

4) Marriage - The final straight of writing a thesis was tough and I was super lucky to have a supportive wife who pushed me to get-shit-done.

butILoveLife · 2026-03-14T11:17:10 1773487030

>What was the motivation? Honestly, I was too lazy to get a job and staying in academia for another 3+ years seemed amazing

This is actually how I view academia. "Couldn't get a job"

It really lowered the prestige of a PhD for me. Heck, if I think through my PhD friends... none of them were A students. They were all C-tier.

otherme123 · 2026-03-14T12:53:26 1773492806

As if "A" or "C" defined a person capacity. I know some straight A's that went directly for a repetitive and boring but well paid and stable job. Other stayed in academia and turned top scientists.

Academia is a very particular dynamic very difficult to find elsewhere, and some people dig it. You can watch some people finding the same dynamic at Google for example, where they are allowed and encouraged to fiddle around and keep publishing (e.g. the Attention paper, why would Google allow such publication?). Such dynamics are explored in Terence Kealy book "The economic laws of scientific research".

bonoboTP · 2026-03-14T12:53:15 1773492795

This varies widely between fields and institutions. Getting a PhD position nowadays in ML or computer vision is much harder. You need to already have publications when you apply and need to have experience specifically in the subfield, give a good talk, an interview, a good motivation letter / research statement, recommendation letters from good internships and multiple PIs you worked with, good grades, etc.

It can be different in other fields an in lower tier colleges.

jefffoster · 2026-03-10T16:55:43 1773161743

I remember attending a tech event at MSR Cambridge, and a speaker made some disparaging comment about older developers not being able to keep up in this modern world of programming.

An older gentleman stood up and politely mentioned they knew a thing or two.

That was Tony Hoare.

jefffoster · 2025-12-29T20:15:07 1767039307

At a guess an Enterprise API account. Pay per token but no limits.

It’s very easy to spend $100s per dev per day.

simonw · 2025-12-29T22:24:19 1767047059

The $200/month plan doesn't have limits either - they have an overage fee you can pay now in Claude Code so once you've expended your rate limited token allowance you can keep on working and pay for the extra tokens out of an additional cash reserve you've set up.

merlincorey · 2025-12-29T22:42:40 1767048160

> The $200/month plan doesn't have limits either... once you've expended your rate limited token allowance... pay for the extra tokens out of an additional cash reserve you've set up

You're absolutely right! Limited token allowance for $200/month is actually unlimited tokens when paying for extra from a cash reserve which is also unlimited, of course.

simonw · 2025-12-29T22:45:24 1767048324

I think you may have misunderstood something here.

When paying for Claude Max even at $200/month there are limits - you have a limit to the number of tokens you can use per five hour period, and if you run out of that you may have to wait an hour for the reset.

You COULD instead use an API key and avoid that limit and reset, but that would end up costing you significantly more since the $200/month plan represents such a big discount on API costs.

As-of a few weeks ago there's a third option: pay for the $200/month plan but allow it to charge you extra for tokens when you reach those limits. That gives you the discount but means your work isn't interrupted.

Extra Usage for Paid Claude Plans: https://support.claude.com/en/articles/12429409-extra-usage-...

merlincorey · 2025-12-29T22:57:13 1767049033

Thank you for the explanation, but I did fully understand that is what you were saying.

What I don't fully understand is how you can characterize that as "not limited" with a straight face; then again, I can't see your face so maybe you weren't straight faced as you wrote it in the first place.

Hopefully you could see my well meaning smile with the "absolutely right" opening, but apparently that's no longer common so I can understand your confusion as https://absolutelyright.lol/ indicates Opus 4.5 has had it RLHF'd away.

simonw · 2025-12-29T23:15:02 1767050102

When I said "not limited" I meant "no longer limits your usage with a hard stop when you run out of tokens for a five hour period any more like it did until a few weeks ago".

That's why I said "not limited" as opposed to "unlimited" - a subtle difference in word choice, I'll give you that.

falcor84 · 2025-12-30T00:00:12 1767052812

Oh, I wasn't arguing that it isn't "easy to spend $100s per dev per day". I was just asking what the use-case for that is.

jefffoster · on March 1, 2025

It’s not only the low obesity rates, the article states it suppresses impulsiveness.

jefffoster · on Feb 27, 2025

Does anyone have any intuition about the how reasoning improves based on the strength of the underlying model?

I’m wondering whether this seemingly underwhelming bump on 4o magnifies when/if reasoning is added.

porridgeraisin · on Feb 27, 2025

It is possible to understand the mechanism once you drop the anthropomorphisms.

Each token output by an LLM involves one pass through the next-word predictor neural network. Each pass is a fixed amount of computation. Complexity theory hints to us that the problems which are "hard" for an LLM will need more compute than the ones which are "easy". Thus, the only mechanism through which an LLM can compute more and solve its "hard" problems is by outputting more tokens.

You incentivise it to this end by human-grading its outputs ("RLHF") to prefer those where it spends time calculating before "locking in" to the answer. For example, you would prefer the output

  Ok let's begin... statement1 => statement2 ... Thus, the answer is 5

over

  The answer is 5. This is because....

since in the first one, it has spent more compute before giving the answer. You don't in any way attempt to steer the extra computation in any particular direction. Instead, you simply reinforce preferred answers and hope that somewhere in that extra computation lies some useful computation.

It turned out that such hope was well-placed. The DeepSeek R1-Zero training experiment showed us that if you apply this really generic form of learning (reinforcement learning) without _any_ examples, the model automatically starts outputting more and more tokens i.e "computing more". DeepseekMath was also a model trained directly with RL. Notably, the only signal given was whether the answer was right or not. No attention was paid to anything else. We even ignore the position of the answer in the sequence that we cared about before. This meant that it was possible to automatically grade the LLM without a human in the loop (since you're just checking answer == expected_answer). This is also why math problems were used.

All this is to say, we get the most insight on what benefit "reasoning" adds by examining what happened when we applied it without training the model on any examples. Deepseek R1 actually uses a few examples and then does the RL process on top of that, so we won't look at that.

Reading the DeepseekMath paper[1], we see that the authors posit the following:

  As shown in Figure 7, RL enhances Maj@K’s performance but not Pass@K. These
  findings indicate that RL enhances the model’s overall performance by rendering
  the output distribution more robust, in other words, it seems that the
  improvement is attributed to boosting the correct response from TopK rather 
  than the enhancement of fundamental capabilities.

For context, Maj@K means that you mark the output of the LLM as correct only if the majority of the many outputs you sample are correct. Pass@K means that you mark it as correct even if just one of them is correct.

So to answer your question, if you add an RL-based reasoning process to the model, it will improve simply because it will do more computation, of which a so-far-only-empirically-measured portion helps get more accurate answers on math problems. But outside that, it's purely subjective. If you ask me, I prefer claude sonnet for all coding/swe tasks over any reasoning LLM.

[1] https://arxiv.org/pdf/2402.03300

npinsker · on Feb 27, 2025

Thanks for a well-written and clear explanation!

jefffoster · on Feb 15, 2025

A great quote that’s stuck with me is that “LLMs are experts in subjects you are not”

jefffoster · on Nov 26, 2024

Mostly functional programming does not work (https://queue.acm.org/detail.cfm?id=2611829)

louthy · on Nov 26, 2024

I have a lot of respect for Erik Meijer and I agree with the basic premise of the paper/article. However, I don't fully agree with Erik's position.

Let's say this was my program:

    void Main()
    {
       PureFunction().Run();
       ImpureFunction();
    }

If those functions represent (by some odd coincidence) half of your code-base each (half pure, half impure). Then you still benefit from the pure functional programming half.

You can always start small and build up something that becomes progressively more stable: no code base is too imperative to benefit from some pure code. Every block of pure code, even if surrounded by impure code, is one block you don't have to worry so much about. Is it fundamentalist programming? Of course not. But slowly building out from there pays you back each time you expand the scope of the pure code.

You won't have solved all of the worlds ills, but you've made part of the world's ills better. Any pure function in an impure code-base is, by-definition: more robust, easier to compose, cacheable, parallelisable, etc. these are real benefits, doesn't matter how small you start.

So, the more fundamentalist position of "once one part of your code is impure, it all is" doesn't say anything useful. And I'm always surprised when Erik pulls that argument out, because he's usually extremely pragmatic.

jefffoster · on Oct 19, 2024

Interestingly they used to attach a sponge to the end. You might think that was because it doesn’t break the glass, but really it was to ensure the nearby houses don’t get woken up for free!

cut3 · on Oct 19, 2024

Interesting solution to limit to one device or household.

jefffoster · on June 16, 2024

Not just cost reduction but allowing more investment. A good architecture can enable more people to work on your product.

gfairbanks · on June 16, 2024

Agreed. See: Scale Your Team Horizontally [1].

"I’m not ready to argue against Brooks’ Law that adding people to a late project makes it later. But today, when developers are working on a clean codebase, I see lots of work happening in parallel with tool support to facilitate coordination. When things are going smoothly, it’s because the architecture is largely set, the design patterns provide guidance for most issues that arise, and the code itself (with README files alongside) allow developers to answer their own questions."

[1] Scale Your Team Horizontally, George Fairbanks, IEEE Software July 2019. https://www.georgefairbanks.com/ieee-software-v36-n4-july-20...

jefffoster · on May 5, 2023

This is an absolutely fantastic book!

Read it to learn how to program, learn a bit about AI as an added benefit.