is it possible for copilot or say llama or gpt4o to suggest a piece of code and actually go and try to run a test that they design on an ide and see if there are any results and try to fix issues?
right now you ask llm to write a code to do basic web scraping for HN website for latest url and give username of the submitter. sure they will give you a code and give you a test script but you as the user have to run the script and give manual feedback to LLM.
if the testing step can be automated, user would give an input and desired output or a prompt and choose between the results, that would be good.
kinda like you do inpainting and outpainting and other painting stuff but for code.
Genetic Programming was a thing in the 90s but hampered by a combination of the inefficiency of largely random mutations (plus some crossover, which was still largely undirected) with low odds of doing anything helpful, and lack of computational speed to test. A GP framework attempting to use LLMs to apply more or less "reasoned" changes within the same structure of generations of "mutations" tested against each other and previous generations best would be interesting.
They key bit here is there is no known way (as yet) to encode "reasoning".
I was a big fan of genetic programming, wrote a lot of code, did lots of research. And unlike LLMs it could end up on code that had never been written before that accomplished some task, but the random walk through a galactic sized space with atom (or maybe molecule) sized solution spaces made it computationally infeasible.
Being able to somehow code 'reasoning' one could do the equivalent of gradient descent to converge on a working solution but without that, you are unlikely[1] to find anything in reasonable amounts of time.
[1] The chance is non-zero but it is very very near zero.
LLMs can definitely end up with code that has never been written before, even before considering that you would be able both to ask it for modifications to very constrained parts of the code and can sample more broadly than always picking the most probably tokens.
But it also appear to have a far higher probability of producing changes that move towards something that will run.
Pretty much no inventions were invented just by thinking, which is the environment most LLMs have.