Devon: An open-source pair programmer

realharo · on May 20, 2024

The demo video shows it making a game of life.

People really should start using examples that don't have literally thousands of step by step tutorials all over the web.

That includes clones of Wordle, Flappy Bird, generic todo lists, etc. With those, I can't tell how well the abilities would generalize to real world projects.

dartos · on May 20, 2024

> I can't tell how well the abilities would generalize to real world projects.

The secret is that it doesn’t ;)

I doubt our current statistics based AI methods ever will.

Programming requires precision. Code is an exact specification for how a machine should operate.

I don’t see how we can get perfectly precise responses from a nondeterministic process.

fragmede · on May 20, 2024

Maybe if you're inventing a new compiler that hasn't existed before, but if all you're doing is yet another a CRUD app for a newly undiscovered niche, then it doesn't have to generalize beyond what already exists. The fact that it's nondeterministic is irrelevant, I can write the same function a bunch of different ways just by choosing different names for variables, does that stop the code from working?

bakuninsbart · on May 21, 2024

The code assistance tools right now are still a bit crude compared to what they will be in a year or two even if capabilities do not increase. I think it will be very useful to have "someone" read along as I code and answer questions I ask, or pose questions on why I do things a certain way. Honestly, even if it is just stuff like "What would be a good variable name for this?" or "What was the map()-syntax like again?", this would already facilitate flow quite a bit.

Not having to context switch as much is a real blessing, and I've only reached it in one language in specific contexts. If I can transfer this to many languages and many contexts, my productivity will rise a lot.

dartos · on May 24, 2024

So What you’re looking for is basically a proactive context aware Google search.

LLMs are great for that! Sometimes they’re good at translating from one language to another. They miss idioms from time to time, but they’re usually pretty good with that task.

Not so much for writing software themselves.

realharo · on May 20, 2024

Well, it's not like I'm just going to trust a comment like this either :)

I would like to see it in action, and then I'll form an opinion.

VMG · on May 20, 2024

LLMs are much more deterministic than humans

anonzzzies · on May 20, 2024

It doesn’t work for those generally. Even with rag and large context and clever prompting, it is not doing well as you need something that basically is simple enough to get in one go, so you need a programmer/logical thinker to cut things so small that the AI gets it, which this type of -just tell me what to build- is simply not compatible with. The greatest programming minds on earth have issues with composing complex ideas from simple ideas; now we expect the AI to write the simple ideas and then compose them. Or just one-shot them, which is really far beyond what they can do.

We have been working on a solution for this for decades with my team and it has nothing to do with AI. We have to solve it for us first and then it will work for AI; or maybe we will never solve it and then AI won’t be able to piece anything together that’s complex and it hasn’t seen before.

irthomasthomas · on May 20, 2024

What demos would you like to see? I have my own agent running on its own Linux system. I reckon it's capable of doing most office work in theory, if you have the budget. On my budget it's a lot more constrained, and often needs a human in the loop to achieve more complex goal. But I'd like to canvas for suggestions for tests I can throw at it.

If get an interesting request, I will try and record its attempt. But it's expensive - in the order of a $1 a prompt.

realharo · on May 20, 2024

One random idea:

Wordle clone tutorials are of course all over the web (just google make wordle clone and see the results), but what about a browser extension + Discord bot, that lets groups of friends compare their scores for the official Wordle, in Discord.

It would automatically post people's results to a Discord server - both the score, and a picture of the full guesses under a spoiler tag. And maybe during playing, the extension would show status like "if you solve it now, you'd be better than 50% of people in the server today". And post a weekly top 3 at the end of each week.

Something like it has probably been done before, but nowhere near as much as the examples I mentioned in my previous comment. Might be a bit too big at this cost though.

bangaladore · on May 20, 2024

Why are they manually implementing API interfaces for various companies when something like OpenRouter exists? OpenRouter provides a unified API for Commercial and opensource models. Seems like the obvious answer for something like this.

elicksaur · on May 21, 2024

Lots of reasons!

- They may not know this library exists.

- They may not think the library is actually suitable for their use case.

- They may not want the dependency included.

roh26it · on May 21, 2024

Openrouter packages APIs and most companies prefer having individual relationships with AI vendors. Choosing an AI gateway might be another way to go

smarm52 · on May 20, 2024

Unclear about the details of this project. Is there an overview or paper related?

Superficially it seems to be an interface for ChatGPT or other similar generative LLM service.

lakomen · on May 20, 2024

I can't imagine coding with someone else smartassing over what I do. Never tried it. Does anyone actually like pair programming?

fragmede · on May 20, 2024

Some love it, some hate it. If you work with a bunch of smart asses in a toxic culture and you can't actually stand your coworkers, I can see why you'd have such an instinctively negative reaction. But if there's a safe culture of mutual respect and you don't work with asshats it can result in greater productivity and fewer bugs and you'll learn things to help you be a better programmer.

But also human psychology - if it's forced on you from above then you'll hate it, if it's your idea then you'll love it.

lakomen · on May 22, 2024

Hmm. Thanks for feedback.

Yeah I guess it all depends on the attitude and if you click.

It could be fun if you get along well with your partner.

But I have a thought and want to make it reality, so I focus on that idea, that thought in order to finish it. Then another voice enters that thought process. Back in the day I would write code and when I had a problem, I would chat up my online contacts on ICQ, and just by explaining the problem I would find a solution, the rubber duck method. However constantly having someone giving their opinion on things... I imagine that being super annoying, when you're in the process of shaping that feature you have laid out in your head.

mholubowski · on May 20, 2024

Hi! Why does it work best with Python? Should I even bother if it’s for a non python project? (In this case, a WordPress plugin).

Thank you!

fragmede · on May 20, 2024

There's just so much python code up on the web to train from, that LLMs are really good at it, relative to something with fewer examples. However, WordPress uses PHP, and there's also plenty of PHP available online, so it's pretty decent at that too. I just used Devon to create a trivial Wordpress plugin, so you can give it a shot, however because it won't be able to run that code, you can't tell it to test the code.

This is a huge shortcoming. When asking ChatGPT to generate python code, it won't always get it right, but you can ask it to keep trying until the code works. since it can't do that in PHP, it'll be a bit more work. Though, depending on how well you're able to take the output and fix it yourself, it could be enough to get the plugin written.

The value isn't in having everything done for you - the technology isn't there yet, imo. it's in making you more effective. If it generates a page of code and you have to tweak a bunch of it to get it to work right, you still come out ahead. For other work, it'll write a bunch of useless code and you're better off without it, so you have to know when to use it and when not to.

can16358p · on May 20, 2024

Yup. I was going to ask the same. If there was wide language support I'd love to try that, but as a non-Python programmer the use case is limited for me.

However I might use it as in places where Python might actually be the best way to go for a script, yet I'd have picked another language as I don't know Python. Then I could simply ask it to create whatever I need, and read over the code to actually learn some Python perhaps.

smokeydoe · on May 20, 2024

It looks like the reason is because it’s trying to run the code it generates

serjester · on May 20, 2024

Why choose this name - seems like a cease and desist waiting to happen.

asp_hornet · on May 20, 2024

Especially given all the negative sentiment in the developer community towards Devin. You would think they would want to distance themselves as much as possible

ithkuil · on May 20, 2024

I thought it was a county in England.

22c · on May 20, 2024

Agreed. In fact, I thought this was an iteration on the previously announced "Devin" project until I realised that they have different spelling.

https://www.cognition.ai/introducing-devin

CasperH2O · on May 20, 2024

Perhaps because of the "dev in chat" meme reference? When (game) developers would show up in the public chat channel.

xprn · on May 20, 2024

Might “Theone” perhaps be a better one to use? Still stays somewhat true to the pronunciation of Devin, but with an emphasis on being The One that might actually work /s