More

fabmilo · 2025-07-03T19:49:40 1751572180

I like zotero, I started vibe coding some integration for my workflow, the project is a bit clunky to build and iterate the development specially with gemini & claude. But I think that is the direction to take instead of reinvent from scratch something

BrtByte · 2025-07-05T08:38:22 1751704702

I've been thinking about a plugin that auto-suggests related papers as I write

fabmilo · 2025-05-23T01:16:06 1747962966

reference to the library: https://trafilatura.readthedocs.io/en/latest/

for the curious: Trafilatura means "extrusion" in Italian.

| This method creates a porous surface that distinguishes pasta trafilata for its extraordinary way of holding the sauce. search maccheroni trafilati vs maccheroni lisci :)

(btw I think you meant trafilatura not trifatura)

thm · 2025-05-23T03:47:24 1747972044

Been using it since day one but development has stalled quite a bit since 2.0.0.

fabmilo · 2025-05-16T16:38:49 1747413529

more excited about the rust impl than the typescript one.

tptacek · 2025-05-16T17:00:53 1747414853

Besides packaging of their releases, what possible difference could that make in this problem domain?

tough · 2025-05-16T17:56:40 1747418200

I just think it's nice to have open source code to reference so maybe he meant just in that -educational- way, certainly more to learn from the rust one than the TS one for most folks? even if the problem-space doesn't require system-level safety code indeed

fabmilo · 2025-05-12T03:33:12 1747020792

The interesting delta here is that this proves that we can distribute the training and get a functioning model. The scaling factor is way bigger than datacenters

comex · 2025-05-12T04:40:40 1747024840

But does that mean much when the training that produced the original model was not distributed?

refulgentis · 2025-05-12T04:40:03 1747024803

The RL, not the training. No?

itchyjunk · 2025-05-12T11:08:30 1747048110

RL is still training. Just like pretraining is still training. SFT is also training. This is how I look at it. Models weights are being updated in all cases.

refulgentis · 2025-05-12T15:35:48 1747064148

Simplifying it down to "adjusting any weights is training, ipso facto this is meaningful" obscures more light than it sheds (as they noted, RL doesn't get you very far, at all)

fabmilo · 2025-04-03T02:27:42 1743647262

I read the paper and the results don't really convince me that is the case. But the problem still remains of being able to use information from different part of the model without squishing it to a single value with the softmax.

fabmilo · 2025-04-03T01:06:03 1743642363

We have to move past tokenization for the next leap in capabilities. All this work done on tokens, specially in the RL optimization contest, is just local optimization alchemy.

devmor · 2025-04-03T02:17:05 1743646625

LLMs in their entirety are unlikely to move past tokenization - it is the inescapable core from the roots of NLP and Markov Chains.

The future of AI and all of ML in general likely does exist beyond tokenization, but I find it unlikely we will get there without moving past LLMs as a whole.

We need to focus on the strengths of LLMs and abandon the incredibly wasteful amount of effort being put into trying to make them put on convincing facsimiles of things they can't do just because the output is in natural language and easily fools humans at first glance.

naasking · 2025-04-03T12:21:53 1743682913

They won't move past tokenization, but you can take it down to the byte level and make it arbitrarily flexible and adaptive:

https://ai.meta.com/research/publications/byte-latent-transf...

byyoung3 · 2025-04-03T11:17:28 1743679048

This is valid but also hard to back up with any alternatives. At the end of the day it’s just a neural network with backprop. New architectures will likely only be marginally better. So either we add new algorithms on top of it like RL, create a new learning algorithm (for example forward-forward), or we figure out how to use more energy efficient compute (analog etc) to scale several more magnitudes. It’s gonna take some time

devmor · 2025-04-03T14:58:24 1743692304

Yeah, that's fair - it's very easy to tell that LLMs are not the end state, but it's near impossible to know what comes next.

Personally I think LLMs will be relegated to transforming output and input from whatever new logic system is brought forth, rather than pretending they're doing logic by aggregating static corpora like we are now.

MoonGhost · 2025-04-04T02:25:33 1743733533

They already can do calculations by using tools and without pretending. Why not to make them write code for logic too. This will extend their 'range'. End user can be provided only summary to keep it look simple.

fabmilo · 2025-02-10T01:38:54 1739151534

I will believe reasoning architectures when the model knows how to store parametric information in an external memory out of the training loop.

pillefitz · 2025-02-10T05:29:44 1739165384

Why is that?

fabmilo · 2025-01-20T20:26:44 1737404804

was genuinely excited when I read this but the github repo does not have any code.

fabmilo · 2025-01-11T19:44:54 1736624694

I was just about to submit this link and redirected me to this page. I am shocked that it received only four comments. If you are working in the LLMs/Agent space ( you are, right?) and you don't understand the significance of this paper, you are set for failure.

fabmilo · 2025-01-01T07:34:02 1735716842

Happy new year to everyone, hacker news is more than my home page. This community is awesome!