Hacker Newsnew | past | comments | ask | show | jobs | submit | two_in_one's commentslogin


I think OP was asking about the expression "a thousand hands on a ouija board", not what a ouija board is.


From the post:

> I implemented imperative code that does what I’m proposing the transformer is doing. It produces outputs very similar to the transformer.

This means there is probably a way to bypass transformers and get the same results. Would be interesting if it's more efficient. Like given foundation model train something else and run it on much smaller device.


I explained that it's not bypassing transformers and not more efficient in another comment: https://news.ycombinator.com/item?id=39254966


> incredibly diverse, and results are going to be highly dependent on which dataset was cherry-picked for benchmarking

This naturally comes to multi-model solution under one umbrella. Sort of MoE, with selector (router, classifier) and specialized experts. If there is something which can't be handled by existing experts then train another one.


the point is it's a fundamentally flawed assumption that you can figure out which statistical model suits an arbitrary strip of timeseries data just because you've imbibed a bunch of relatively different ones.


as long as you can evaluate models' output you can select the best one. you probably have some ideas what you are looking for. then it's possible to check how likely the output is it.

the data is not a spherical horse in the vacuum. usually there is a known source which produces that data, and it's likely the same model works well on all data from that source. may be a small number of models. which means knowing the source you can select the model that worked well before. even if the data is from alien ships they are likely to be from the same civilization.

I'm not saying that it's a 100% solution, just a practical approach.


it's a practical approach to serve normalized data but monitoring systems are most valuable by making abnormal conditions inspectable. proper modeling of a system has this power

so while this seems persuasive, it's fundamentally about normal data which yields little value in extrapolation


If only it stayed on user's system. Likely MS makes a 'backup' on its servers. Verizon used to do it. With each update they turned on backup option and siphoned contacts before user could react.


> 1. For "the singularity" to happen, we probably need something more to happen than just chatGPT to ingest more data or use more processing power.

It's not actually clear what "the singularity" is? Is it something running out of control or it's still controllable? There is a blurry line. People are afraid because they think it's sort of uncontrollable explosion.

The second question is about AGI. What is it? Is it something 'alive' or just a generic AI calculator with no 'creature' features. Like self preservation at least.

I think our view of these two things will change soon as we get a close up picture. Pretty much like Turing test doesn't look great anymore. As even dumb chatbots can pass.


I personally define AGI as a technology capable of improving itself exponentially.

But I realize my definition is in the minority. :/

Of course if we ever manage to make a 1:1 cybernetic brain that works exactly like a human's brain, and is also a complete black box, we'll have achieved AGI. I'm not sure how useful that will be, but I'll have to admit it is AGI.

So maybe I should say, "interesting AGI" is technology that can improve itself exponentially. :-D


Yeah, if you could input data set X with quality Q(X) and output data set Y of the same size with Q(Y) > Q(X), you'd really be on to something. But I don't think such a system exists yet, or even close. Inputting the internet, outputting a sea of garbage with a handful of diamonds that people have to spelunk through the garbage for seems the best so far. Madlibs is a pretty equivalent activity, and while fun, certainly not anything one would consider AGI. We need a revolutionary improvement on automated spelunking to get anywhere. Maybe we'll get a good spam filter as a side effect!

But even if you had a system, there's still the resource cost to run the algorithm (needs to be bounded or else you've just made a finite jump) and the gains you make need to not decay (or again you've just made a finite jump).

And all this needs to be compared against investing in humans - which seem pretty clearly to have AGI properties (but with some really bad constant factors. 20 year training time - ridiculous!)

To me it seems things are a long way off and least a couple of major innovations away. But at least there's some ideas of the problems to tackle, which is a big step up!


I personally define AGI as a technology capable of improving itself exponentially.

Have you thought this through though? I mean you'd first have to know in what way to improve, which would become more challenging as you became more perfect, you'd have to want to become perfect (which let's be honest might be boring) and then I guess there's the fact that if you kept evolving / improving yourself exponentially, then you'd no longer exist because you'd likely be morphing into other forms all the time?

In a way, maybe the only thing I can think of that is observably doing something like this is the universe itself.


> can improve itself exponentially

This is close to singularity. Except 'does' instead of 'can'. A big difference ;)

Probably we need several AGI terms. Because sub-human robot capable of doing many not pre-programmed thing is sort of it. Still not smart enough to improve itself.

Actually most humans, the smartest creatures, cannot improve even current AI. Demand for self improvement will put its IQ in top 0.01% of all known intelligent creatures. Which is probably too much for just AGI, we may not recognize it when it will be already here. And there is another question. With such IQ do we really want to keep it slave forever?


> now supports FlashAttention-2, yielding around 2x speedups

> torch.compile improvements

so far 2.1 didn't work well with MoE GPT, at least in my implementation, due to dynamism in data flow. will check how 2.2 does


Not clear, are they scaling down or optimizing?

> Last week, PayPal announced a push into artificial intelligence features.

> Chriss called it the beginning of PayPal's "next chapter."

Looks like they are replacing some positions with AI.


Whatever Meta's motivation is they help diversify models suppliers. Which is a good thing not to be locked in. As usual reality is more complicated with many moving part. Free models may undercut small startups. But at the same time they stimulate secondary market of providers and tuners.


at least it wasn't

   from transformers import


As it's still work in progress may I suggest? It would be nice if you go beyond what others have already published and add more details. Like different position encodings, MoE, decoding methods, tokenization. As it's educational easy to use should be a priority, of course.


Thanks, comparing positional encodings, MoEs, kv-caches etc are all good topics that I have in mind for either supplementary material and/or a follow-up book. The reason why it probably won't land in this current book is the length and time line. It's already going to be a big book as it is (400-500 pages). And I also want to be a bit mindful of the planned release date. However, these are indeed good suggestions.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: