Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My guess is because the models were all trained on text. You could do as you say, but I think it would go: blender video {gets described by an AI into text}-> text prompt -> video.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: