I just want to feed an LLM hunter x hunter episodes and get out new ones. But on...

I just want to feed an LLM hunter x hunter episodes and get out new ones.

But on a more serious note, I vividly remember when GANs were the next big thing when I was in university and the output quality and variability was laughable compared to what midjourney and the likes can produce today (my mind was still blown back then). So I would be in no way suprised if we got to a point in the next decade where we have "midjourney" for video generation. So I wholeheartedly agree.

I also think the computational problem is tackled from so many angles in the field of ML. You have nvidia releasing absolute beasts of GPUs, some promising start ups pushing for specialized hardware, a new paper on more optimized training methods every week, mamba bursting on the scene, higher quality data sets, merging of models, framework optimizations here and there. Just the other day I think I saw a post here about locally running larger LLMs. Stable Diffusion is already available for iPhones at acceptable qualities and speed (given the devices power).

What I wonder about the most though is whether we will get more robust orchestration of different models or multi modal models. It's one thing to have a model which given a text prompt generates a short video snippet. But what if I instruct my model(s) to come up with a new ad for a sports drink and they/it does research, consolidates relevant data about the target group, comes up with a proper script for an ad, creates the ad, figures out an evaluation strategy for the ad, applies it and eventually gives me back a "well thought out" video. And all I had to do was provide a little bit of an intro and then let the thing do its magic for an hour. I know we have lang chain and baby AGI but they are not as robust as they would need to be to displace a bunch of jobs just yet (but I assume they will soon enough).