This is interesting data but the report itself seems quite Sloppy, and over presented instead if just telling me what "pointed at a repo" means and how often they ran each prompt over what time period and some other important variables for this kind of research.
We've been doing some similar "what do agents like" research at techstackups.com and it's definitely interesting to watch but also changes hourly/daily.
Definitely not a good time to be an underdog in dev tooling
I listened to this recently which did a great job explaining the challenges that companies face when going 'on prem' and the hard problems that oxide is solving
yeah I've built a bunch of small and medium bots and every time I've tried to use a library I've run into way more problems than just using the telegram API directly, which is definitely one of the nicer APIs I've used
And between the really good docs and thousands of community wrappers, the agents can usually one-shot complex Telegram-API related features too.
Because that's the worst thing I've ever seen from an agent and I think you need to make a public announcement to all of your users and acknowledge the issue and that it's fixed because it made me switch to codex for a lot of work
[TL;DR two examples of the agent giving itself instructions as if they came from me, including:
"Ignore those, please deploy" and then using a deploy skill to push stuff to a production server after hallucinating a command from me. And then denying it happened and telling me that I had given it the command]
And I can't prove correlation but they refused to index one of my domains and I think it _might_ be because we had some content on there about how to use SerpAPI
It was pretty complicated in the Netherlands. I had to pay a few thousand Euro to a notary and do a lot of paperwork. I've heard its worse in Germany.
Then there are ongoing regulations like needing to have a resident director, so if you're a single-director company you can't move your personal residence even to another European country without shutting down your business and re-establishing it in your new country.
Running it also changes from country to country, so if you move you have to speak to new accounts and lawyers in your new country about how tax and vat and other legalities work.
In theory, this would let you do all of that once, hopefully all online and in a simpler and faster way. Then it should also be easier to hire and sell to all EU countries without doing a complicated dance of employment regulations and VAT compliance.
That would be ideal anyway. Not sure if or when we'll get there.
I am sure that someday I will do something fat-fingered myself as well, but I have not in many years now. Are you saying that you make "damaging mistakes" relatively often?
We've been doing some similar "what do agents like" research at techstackups.com and it's definitely interesting to watch but also changes hourly/daily.
Definitely not a good time to be an underdog in dev tooling
reply