More

bryan0 · 2025-04-25T15:30:06 1745595006

Here is the paper written about this prompt: https://arxiv.org/html/2406.02061v1

bryan0 · 2025-04-01T16:58:14 1743526694

Judge agreed that Mitchell had expressed joy.

> Barlow found Mitchell did have an existing reputation as a cheat and for suing people who alleged he was a cheat, and found that Mitchell had expressed joy when he believed – incorrectly – on an earlier occasion that Apollo Legend may have died.

bryan0 · 2025-03-29T03:16:24 1743218184

> "But stalkers!"

I think attributing the problem to “stalkers” minimizes the issues this arrangement of publicly searchable surveillance data creates. Imagine a website where you can type in anyone’s name and it shows you their last known location and their location history. You would have a system which supports universal spying for mundane and nefarious reasons alike. Not just criminal “stalkers” will take advantage of it.

Potentially this sort of arrangement would work if there are limits on the granularity, frequency, and history of the tracking data.

bryan0 · 2025-03-20T18:17:40 1742494660

> Web search is available now in feature preview for all paid Claude users in the United States. Support for users on our free plan and more countries is coming soon.

AcquiescentWolf · 2025-03-20T20:27:02 1742502422

People outside the US obviously don't exist, therefore the statement is correct.

mpalmer · 2025-03-20T22:45:18 1742510718

Easy to believe our weak privacy laws are part of the reason we get tech features first. Huzzah...

bryan0 · 2025-03-19T19:39:09 1742413149

Are people fine-tuning LLMs on their local machines with a single GPU? What are people using to scale their training to multiple nodes / gpus? I've been playing around with Hugging Face Estimators in sagemaker.huggingface but not sure if there are better options for this?

samspenc · 2025-03-19T19:55:09 1742414109

It takes a significant amount of time (few hours) on a single consumer GPU, even 4090 / 5090, on personal machines. I think most people use online services like runpod, vast ai, etc to rent out high-powered H100 and similar GPUs for a few cents per hour, run the fine-tuning / training there, and just use local GPUs for inference on those fine-tuned models generated on cloud-rented instances.

danielhanchen · 2025-03-20T00:06:53 1742429213

It used to be that way! Interestingly I find people in large orgs and the general enthusiast don't mind waiting - memory usage and quality are more important factors!

deet · 2025-03-19T21:47:42 1742420862

Google Colab is quite easy to use and has the benefit of not making your local computer feel sluggish while you run the training. The linked Unsloth post provides a notebook that can be launched there and I've had pretty good luck adapting their other notebooks with different foundational models. As a sibling noted, if you're using LORA instead of a full fine-tune, you can create adapters for fairly large models with the VRAM available in Colab, especially the paid plans.

If you have a Mac, you can also do pretty well training LORA adapters using something like Llama-Factory, and allowing it to run overnight. It's slower than an NVIDIA GPU but the increased effective memory size (if you say have 128GB) can allow you more flexibility.

michaelt · 2025-03-19T21:16:29 1742418989

Take a look at the hardware requirements at https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#...

A 'LoRA' is a memory-efficient type of fine tuning that only tunes a small fraction of the LLM's parameters. And 'quantisation' reduces an LLM to, say, 4 bits per parameter. So it's feasible to fine-tune a 7B parameter model at home.

Anything bigger than 7B parameters and you'll want to look at renting GPUs on a platform like Runpod. In the current market, there are used 4090s selling on ebay right now for $2100 while runpod will rent you a 4090 for $0.34/hr - you do the math.

It's certainly possible to scale model training to span multiple nodes, but generally scaling through bigger GPUs and more GPUs per machine is easier.

danielhanchen · 2025-03-20T00:05:27 1742429127

For experimentation and smaller models, single gpu is the way to go! Tbh I normally find most people to spend the majority of their time on datasets, training loss convergence issues etc!

But if its helpful I was thinking about spinning up a platform for something like that!

jsight · 2025-03-19T20:33:32 1742416412

For experimentation? Absolutely. It can often be done overnight for smaller models and reasonably sized GPUs (24GB+).

It'd become a lot less practical with huge datasets, but I'd guess that a lot of fine tuning tasks aren't really that large.

bryan0 · 2025-03-16T18:48:44 1742150924

I had the same reaction. If you make it to the end he concludes with:

> The wave function’s pattern can travel across regions of possibility space that are associated with the slits.

Which to me conflicts with his emphatic “no” at the beginning of the article because this implies you can define some mapping between the physical and probability space. And of course you can because if you couldn’t the theory would not be physically predictive.

tsimionescu · 2025-03-16T19:19:43 1742152783

His point from the beginning is this: the particle described by the wavefunction can't be said to move through both slits at once, because ψ(t, x, y) has a single value for a particular x and y at a particular time. The particle has non-0 probability for both x, y1, t and for x, y2, t, of course - but that just means the particle has non-0 probability to pass through either slit.

And as for saying that the wave moves through both slits, that also doesn't make sense, by the very definition of the wave function - it's a wave in probability space, not in space, so it just doesn't move through space.

lmm · 2025-03-17T08:44:11 1742201051

> And as for saying that the wave moves through both slits, that also doesn't make sense, by the very definition of the wave function - it's a wave in probability space, not in space, so it just doesn't move through space.

I don't think that's a valid argument. Imagine a regular water wave, i.e. a wavefunction h = h(x, y, t) describing the height of the water at position (x, y) at time t. You could say "this is a wave in height space, not in space, so it just doesn't move through space" and in a certain sense that's true. But obviously there is something that does "move" through "space" to the extent that anything can ever be said to do so.

bryan0 · 2025-03-16T19:56:16 1742154976

I’m with you on point 1, (I think this is also obvious from experiment because you will never measure a particle at both slits).

for point 2 it seems you can define a mapping from the physical space to probability space. Saying that the wave doesn’t “move through” space might be technically correct but also seems like semantics on the definition of the phrase “move through” ?

tsimionescu · 2025-03-18T07:23:44 1742282624

Of course it is to some extent semantics. But the important point is that the wavefunction is not something like a sound wave, or even something like a classical EM wave. Those are all waves defined over 3-dimensional physical space.

In the original QM model, light is not a wave in the classical electrical theory sense. Light is made up entirely of photons, which are particles just like electrons or billiard balls, and they are described by a wavefunction. That wavefunction gives them various probabilities of being in various states at a certain time, and those probabilities can increase or decrease when more particles come into the mix. The states can represent position, momentum, charge, spin, energy levels, etc.

GoblinSlayer · 2025-03-18T17:31:40 1742319100

Interference pattern is a measurement of a particle at both slits.

bryan0 · 2025-03-11T22:41:43 1741732903

I just did the same thing. Claude (sonnet 3.7) walked me through a toy implementation of the algorithm.

bryan0 · 2025-02-10T10:49:25 1739184565

Are LLMs not trained on NPR transcripts?

bryan0 · 2025-02-07T17:05:01 1738947901

Thank you for the additional info and links. This is why I love HN comments

bryan0 · 2025-02-02T03:37:43 1738467463

Yes but these were steps were not used in R1-zero where its reasoning capabilities were trained.

littlestymaar · 2025-02-02T10:28:24 1738492104

And as a result R1-zero is way too crude to be used directly, which is a good indication that it remains relevant.