Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Let me add a few:

- organic data exhaustion - we need to step up synthetic data and its validation

- imbalanced datasets - catalog, assess and fill in missing data

- backtracking - make LLMs better at combinatorial or search problems

- deduction - we need to augment the training set for revealing implicit knowledge, in other words to study the text before learning it

- defragmentation - information comes in small chunks, sits in separate siloes, and context size is short, we need to use retrieval to bring it together for analysis

tl;dr We need quantity, diversity and depth in our training sets



And I’ll add some more:

- LLMs aren’t very good at large scale narrative construction. They get too distracted by low level details that they miss the high level details in long text. It feels like the same problem as stable diffusion giving people too many fingers.

- LLMs have 2 kinds of memory: current activations (context) and trained weights. This is like working memory and long term memory. How do we add short term memory? Like, if I read a function, I summarize it in my head and then remember the summary for as long as it’s relevant. (Maybe 20 minutes or something). How do we build a mechanism that can do this?

- How do we do gradient descent on the model architecture itself during training?

- Humans have lots more tricks to use when reading large, complex text - like re-reading relevant sections, making notes, thinking quietly, and so on. Can we introduce these thinking modalities into our systems? I bet they’d behave smarter if they could do this stuff.

- How do we combine multiple LLMs into a smarter overall system? Eg, does it make sense to build committees of “experts” (LLMs taking on different expert roles) to help in decision making? Can we get more intelligence out of chatgpt by using it in a different way in a larger system?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: