I predict that the future of LLM's when it comes to coding and software creation is in "custom individually tailored apps".
Imagine telling an AI agent what app you want, the requirements and all that and it just builds everything needed from backend to frontend, asks for your input on how things should work, clarifying questions etc.
It tests the software by compiling and running it reading errors and failed tests and fixing the code.
Then, it deploys the software in production for you. It compiles your app to an APK file and publishes it on the Google play store for example.
Sure an LLM now may still not be able to get everything perfect as far as it's outputs go. But surely there's already systems and workflows in place that will auto run your code, compile it, feed errors back to the LLM, some api to interact with cloud providers for hosting etc?
I have been trying to imagine something similar, but without all the middleware/distribution layer. You need to do a thing? The LLM just does it and presents the user with the desired experience. Kind of upending the notion that we need "apps" in the first place. It's all materialized, just-in-time style.
What's it called when you describe an app with sufficient detail that a computer can carry out the processes you want? Where will the record of those clarifying questions and updates be kept? What if one developer asks the AI to surreptitiously round off pennies and put those pennies into their bank account? Where will that change be recorded, will humans be able to recognize it? What if two developers give it conflicting instructions? Who's reviewing this stream of instructions to the LLM?
"AI" driven programming has a long way to go before it is just a better code completion.
Plus coding (producing a working program that fits some requirement) is the least interesting part of software development. It adds complexity, bugs and maintenance.
> What's it called when you describe an app with sufficient detail that a computer can carry out the processes you want?
You're wrong here. The entire point is that these are not computers as we used to think of them. These things have common sense; they can analyse a problem including all the implicit aspects, suggest and evaluate different implementation methods, architectures, interfaces.
So the right question is: "what's it called when you describe an app to a development team and they ask back questions and come back with designs and discuss them with you, and finally present you with an mvp, and then you iterate on that?"
Bold of you to imply that GPT asks questions instead of making baseless assumptions every 5 words, even when you explicitly instruct it to ask questions if it doesn't know. When it constantly hallucinates command line arguments and library methods instead of reading the fucking manual.
It's like outsourcing your project to [country where programmers are cheap]. You can't expect quality. Deep down you're actually amazed that the project builds at all. But it doesn't take much to reveal that it's just a facade for a generous serving of spaghetti and bugs.
And refactoring the project into something that won't crumble in 6 months requires more time than just redoing the project from scratch, because the technical debt is obscenely high, because those programmers were awful, and because no one, not even them, understands the code or wants to be the one who has to reverse engineer it.
Of course, but who's talking about today's tools? They're definitely not able to act like an independent, competent development team. Yet. But if we limit ourselves to the here-and-now, we might be like people talking about GPT3 five years ago: "yes it does spit out a few lines of code, which sometimes even compiles. When it doesn't forget half way and starts talking about unicorns".
We're talking about the tools of tomorrow, which, judging by the extremely rapid progress, I think is only a few (3-5) years away.
Anyway, I had great experiences with Claude and DeepSeek.
Most software is useful because a large number of people can interact with it or with each other over it. I'm not so certain that one-off software would be very useful for anyone beyond very simple functionality.
Marvin Minsky promised that an AI would have a PhD, by 1950, and 1960... we are no closer. sorry. We are faster, much faster, 100,000,000 times faster, by we are no closer.
aider jams the backend on my PC, i have to kill the tcp connection or python to stop it running a GPU on the backend, from time to time. I can't imagine paying for tokens and not knowing if it's working or wasting money.
aider sucessfully made, 1-shot, a 2048 clone in architect mode, serverless, local html+js+css. i pushed the git repo it made to my github, aider2048clone. I used deepseek-r1-llama-70b distill, it took ~3 hours. after the first 10 minutes i didn't want to interrupt it, because who cares how long it takes if it works?
I haven't been able to get it to do anything but waste my tokens with deepseek itself as the backend (aider --model deepseek[/deepseek-reasoner|/deepseek-chat] i think but am not certain).
I think the architect mode might be worth looking at but i'm going to attempt to aider.exe $(*.txt) and then switch to /ask mode and see if it can be used as a 0-shot document query.
because even a rudimentary, garbage implementation would be fun to have, i think.
That's going to be much slower and more expensive than writing tests because image/video processing is slower and more expensive than writing tests. And because of lag in using the UI (and re-building the whole application from scratch after every change to test again).
Hm, what if instead of using video of the application…
Ok, so if one can have one program snoop on all the rendering calls made by another program, maybe there could be a way of training a common representation of “an image of an application” and “the rendering calls that are made when producing a frame of the display for the application”? Hopefully in a way that would be significantly smaller than the full image data.
If so, maybe rather than feeding in the video of the application, said representation could be applied to the rendering calls the application makes each frame, and this representation would be given as input as the model interacts with the application, rather than giving it the actual graphics?
But maybe this idea wouldn’t work at all, idk.
Like, I guess the rendering calls often involve image data in their arguments, and, you wouldn’t want to include the same images many time as the input to the encoding thing, as that would probably (or, I imagine) make it slower than just using the overall image of the application. I guess the calls are probably more pointing to the images in memory though, not putting an entire image on the stack.
I don’t know enough about low-level graphics programming to know if this idea of mine makes any sense.
Yes, it would be significantly smaller, but it would look very different depending on your platform, GPU, driver version, etc. -- the model would essentially need to learn how to map "graphics APIs" (e.g. OpenGL, Vulkan, Metal, ...) to "render result" for every combination of API, driver version, and GPU, which I imagine would constitute a significant amount of overhead.
It tests the software by compiling and running it reading errors and failed tests and fixing the code.
Then, it deploys the software in production for you. It compiles your app to an APK file and publishes it on the Google play store for example.
Sure an LLM now may still not be able to get everything perfect as far as it's outputs go. But surely there's already systems and workflows in place that will auto run your code, compile it, feed errors back to the LLM, some api to interact with cloud providers for hosting etc?