Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I am glad to see more stuff with graph based AI here.

I have a running bet with a friend about whether future is going to be OBM (One Big Model) or LoLM (Lots of Little Models). I'm strongly in the LoLM/graph camp and have been working in that direction as well: https://github.com/Miserlou/Helix



Very interesting! "The general hypothesis of the project is: Consciousness, or something resembling consciousness, emerges not from the capability of a single task model like GPT or Stable Diffusion, but from the oscillations between the inputs and outputs of different instances of different models performing different tasks."

Your metaphors of self-oscillation and multiple oscillations are very much in line with the consciousness model that is built on the top of Adaptive Resonance Theory. I believe this is the most computationally robust model for consciousness. You might want to read/skim this https://www.sciencedirect.com/science/article/pii/S089360801...

That can be a forbidding read because it packs so much (65 years of work!)

You can also read Journey of the Mind (https://www.goodreads.com/book/show/58085266-journey-of-the-... I'm the co-author) which, among other things, covers Grossberg's work and his model of consciousness built on the idea of resonance. Here resonance goes beyond the metaphorical idea and has a specific meaning.

edit: https://saigaddam.medium.com/understanding-consciousness-is-... (here's a super brief description of Adaptive Resonance Theory )


> "The general hypothesis of the project is: Consciousness, or something resembling consciousness, emerges not from the capability of a single task model like GPT or Stable Diffusion, but from the oscillations between the inputs and outputs of different instances of different models performing different tasks."

This is the underlying theory of classical liberal education, stemming back thousands of years.

We learn different ways of thinking, different lens through which we view the world, and we can apply those lens as needed to solve different problems.

Indeed when conversing with someone who has over-indexed on just one type of learning, we take notice, we say that person's worldview is limited. (For example, an engineer trying to sell a new product, but who doesn't understand that people aren't willing to toss away all their old skills for what is an incremental improvement in workflow, they should take a few courses in psychology! :) )

Take any famous work of architecture. An engineer can appreciate it for the eloquence of its construction, an artist can appreciate its beauty, the shapes, the shading, colors, textures. A historian can appreciate how it incorporates elements of the region's history and cultures.

Someone trained in all three (as anyone who graduated from a good university should have been, to at least some extent) will be to switch between modalities of thought at will, and also integrate those modalities together, and thus hopefully, derive more pleasure from their experiences of the world.

Of course AIs will need to have multiple models!


This becomes a semantic debate if we do not define the boundaries between models. If models are "integrated" to an extreme, then they are really just the same model. ...the tradeoff of having one model vs two models is often driven by resources used to hold and serve content from a model, but there are also mathematical constraints as, for example, the size of the model grows in proportion to the quadratic of input data, which means that separate models which can communicate with one another are more efficient.

...but the trick is defining that inter-model communication and establishing a "controller" model with appropriate training data.


Thanks for sharing these links. I actually was part of a computational neuroscience program in university, but I never liked the "wet" side of things and all of the "AI" at the time was focused on kNN and SVM, so I'm well behind on what's cutting-edge in CNS. This seems like a good starting point to catch up again.

EDIT: I'm so dumb, the people behind ART were professors in my department! I know it seemed familiar. The whole thing left me jaded.


Stephen Grossberg or Gail Carpenter, or one of their students? You weren't at BU CNS were you?


I was, doing a joint BA/MA program during undergrad. This was a decade ago though.


so we overlapped :) wrapped up my PhD there a decade and some years ago.


Small world! I wonder how departments like that have adapted to the post-"Deep" world.


Was always more of a neuro department with application work being secondary...


That last link is great! Very compelling. I’ve bought the book…


Thanks!


Agreed, from what I can see pushing the size of models higher and higher gets you better results but also scales up problems at the same rate. Smaller models are more controllable and more predictable, and just like anything else, specialization tends to produce better results than having one jack-of-all-trades tool that handles everything.

There are fundamental weaknesses with LLMs that aren't present in other approaches. There are strengths to LLMs too, but that's the whole point. I am much more optimistic about the potential to get multiple models focusing on different problems to coordinate with each other than I am about the possibility of getting a single LLM to just be good at everything.

There's a lot of really unbelievably hard problems that are showing up just with GPT-3, and as the model gets bigger, those problems are going to get worse, not better because in some ways they are a consequence of the model being so large. But like... there are domains where you don't care about those downsides, or where those downsides only matter for one specific part of whatever application you're building. So if you can away with just not having GPT-3 involved in that part of your process and doing something else... Don't pound in a nail with a screwdriver.


I've done some work with graph neural nets as well as text NNs.

I think we've repeatably seen that models which replace an end-to-end system with a single model work amazingly well when there is sufficient data to train the whole system.

But there are often practical reasons why a non-end-to-end system are easier to build as an intermediate step.


And, in theory, there is nothing stopping you from setting up a graph based system consisting of several small models and train that end-to-end.


Yeah I feel like for development, OBM is great and super flexible.

But when you actually want to deploy, a lot of tiny, more efficient models would probably be the best bet.

I read somewhere that the a company ended up fine-tuning FLAN-T5 instead of going GPT-3, which I can imagine saved them lots of $$.


FLAN-T5 is a very capable model for anything that is non-generative.


Seeing how langchain is gaining popularity and development rapidly, I would agree. Chaining lots of specific models and tools seems to be the way forward.


Hadn't heard of langchain, here's a link: https://github.com/hwchase17/langchain


Woah, seeing that github handle takes me back to 2015 when I was working in python and you had a tool to quickly bootstrap aws lambda services (zappa?).


helix looks amazing! that's exactly the kind of thing i'm looking to burn through openai credits with.


Cheers! I've got loads and loads of ideas for it, but can't seem to find the time to hack on them at the moment while building a SaaS at the same time. When we get a proper ChatGPT API endpoint it'll really start to get interesting.


It looks amazing. (Choice of Elixir is inspired. Great match to problem space.)


Hard endorse




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: