Hacker Newsnew | past | comments | ask | show | jobs | submit | adityavinodh's commentslogin

I’m doing the same but in Zig


Nice! How do you like it?

I thought about Zig too but went with Rust because I had a couple of peers doing the same. And learning together is a little more fun.


What are the biggest issues that the agent faces at the moment? I still find these general purpose agents frustrating to use at times because people position it as if it could do anything and then when you give it a reasonably complex task it breaks down.

I guess if someone figured out way to minimize the impact of an error, like a way for it to gracefully handle it without it feeling like too much work, that would fix most of the problems.


Lots of interesting issues:

- The agent has a tool to set it's task to 'completed', 'failed', or 'needs_help', with the last one being a option for human in the loop scenarios. Sometimes the agent gets lazy and says it needs help prematurely.

- Additionally, the agent can create subtasks for itself, either to run immediately, or to schedule in the future. Here it again can call that tool a bit too eagerly, filling duplicate subtasks for a task that involves repetitive work.

- Properly handling super long running tasks, that run for 1+ hours. The context window eventually hits it's limit (this will be addressed this week)

Aside from those top of mind issues, there's a whole bunch of scaffolding issues - filesystem permissions, prompt injection security, i/o support, token cost - lot's to improve!

We're still super early, but already these agents are showing flashes of brilliance, and we're gaining more and more conviction that this is the right form factor


> showing flashes of brilliance

A “flash” of anything is also called a fluke, or a coincidence. The dumbest moron can have a flash of brilliance on occasion. So could a random word masher. Consistency is what matters.

> and we're gaining more and more conviction that this is the right form factor

Are we? Who’s “we”? Because it looks to me like the LLM approach is lacklustre if you care about truth and correctness (which you should) but the people and companies invested don’t really have a better idea and are shoving them down everyone’s throats in pursuit of personal profit.


Agreed, and the consistency has improved over time. I remember only a 9 months ago struggling to get a browser agent to accurately click on a checkbox. The growth trajectory is what has us excited.

"We" are a YC-backed startup: https://www.ycombinator.com/companies/bytebot.

Re: truth and correctness, their are different tolerances depending on the type of task.


> We're still super early, but already these agents are showing flashes of brilliance, and we're gaining more and more conviction that this is the right form factor

Slow down cowboy; we're seeing "flashes of brilliance" and "that this is the right form factor" for writing code only!

I'm still waiting for AI/LLM's to be posing a danger to jobs other than those in software development and the arts.


This one isn't for coding, they mention in the post that coding agents thrive in custom tool-use environments.


> This one isn't for coding, they mention in the post that coding agents thrive in custom tool-use environments.

Well, that is why I am skeptical and said

>> I'm still waiting for AI/LLM's to be posing a danger to jobs other than those in software development and the arts.

The goal of this product is admirable but, I feel, lacks some grounding: doing screenshots, then converting those images to text, then processing, then converting that to actions, then converting the actions to input events ... results in 4 separate points of failure. So many points of failure each with a success rate (last I checked) of <90% gives you something stupid like an eventual success rate of 0.9 * 0.9 * 0.9 * 0.9 = 0.66.

The same iterative workflow for software development is pretty much 2 steps: process input, then produce output, with 100% success (or close to it for "output", as it's just rewriting the files according to the processing) and 90% for processing which is why it appears to work so well[1].

I dabbled briefly in this and explored a few different ways of making LLMs use the ERP/business system effectively, and with all the current popular business systems, this is simply not possible with a high enough success rate because those systems have few "structured text" output, and even fewer "structured text" input. In fact, some of them have exactly zero "structured text" input.

To make the most of LLMs in your business system, you're going to need a new one that is primarily text-IO based (structured text, if necessary) and only secondarily GUI-for-humans based.

[1] In truth, using tools is a poor way to extend the reach and grasp of the LLM into the operator's context.

It works well for one mainstream use-case: software development, because then you need less than a dozen tools to automate an entire development iteration (read file, list files, insert into file, run test command, etc).

Try doing that with a mini-ERP type of system; there's just no way to keep a small set of 12 tools that can do any workflow that the operator can do. You'll quickly run into a situation where every prompt request includes tool description for about 500 tool calls.

Agentic automation is working very well for coding, where all the input is structured text, all the output is structured text, and all the changes are structured text.

The only way for ERP, Accounting, etc to ever get to this level of agent-based automation is if the base product itself is completely 100% structured text IO based, with the human-operator interface built on top of that.


I respectfully disagree! There's a lot of opportunity behind keyboard + mouse + screen.

In a way Bytebot is a maximalist bet on the growth and improvement of multi-modal LLMs. I firmly believe that in a short period of time, the token cost will drop, while the capability increases (both dramatically). It's still uncertain, which makes it a great asymmetric bet.

We don't do any sort grounding or image conversion, and we offer a handful of tools. I'll go into more detail in my next post.


See my comprehensive reply downthread (it's very long, you cannot miss it).

While I am skeptical due to already having explored this for SMME Line of Business applications, I wish you all the best of luck.

My approach is to simply build a new system from the ground up that can take advantage of structured IO.

[EDIT: send me a message with a link to a post about your product (or this blog), I'll connect with you on linked-in and share your post with my network, meager though it may be]


Will do!


What is your business model?


We're working with design partners as forward deployed engineers, helping setup Bytebot on their infra and tackle use cases.

We'll be launching a self-serve cloud platform soon!


Your profile has no contact details. Feel free to reach out for me if you want some feedback.


That's great for most people. I wanted a little more - a notification on my phone when I got a submissions. Unfortunately setting this up is pretty annoying, and nobody else was doing it. They were only doing email and slack notifications which I didn't like as it just cluttered my inbox.


It's just as easy to have that CGI script call some webhook to send it to whatever push gateway, email, some chatbot, etc.


I’m sure that’s easy. There’s a bunch of services that use webhooks, email, slack etc. Unfortunately for an extremely simple dashboard to view responses and provides push notifications on mobile, I haven’t found any services. That’s I built a clean mobile app that provides push notifications. That way your email doesn’t get cluttered, you get notified, and you have a simple way to view responses without any setup.

It’s not for everyone, if you want to roll your own setup that’s great. But I intentionally built this for developers to give them full flexibility while still saving them time so they can focus on more important work.


Forms are everywhere: contact forms, support forms, feedback forms, ...

For startups, it is important to monitor and quickly respond to events like sign ups or form submissions. This post explains the different options to handle form submissions without managing the backend yourself.


I think this is the more important question too. I don't think it is right in many cases to use the identity of the user and provide access to these agents. If it a simple one-time task, that might work if you can give restricted temporary access to the agent.

But for any other long-term task that may span hours or days while needing access to various data sources or APIs, we need a system where the agent has its own identity (which may be tied to the user). Just as humans are, agents might not function in the ideal manner at all times. So, we might need a system to monitor 'karma' of these agents. That ways API providers can confidently provide access to both humans and agents, and limit their risk to dangerous agents.


Yeah my initial reaction was not too positive. There's something weird to me about simply delegating verification to a third party organization. I'd prefer a more pure solution. Maybe we don't have a solution yet that is simple enough for widespread adoption. The domain based identity does seem a bit too complicated for the average user.


Looks great! Was just looking for something like this.


I think this is a cool idea. I have personally been in the position where you have built a product, and now you suddenly need to start thinking about internal tools to view and manage your product. Building them are crucial, but it's also something where you don't want to spend too much time. While this seems great, my only suggestion is if this could be a little bit more extensible. This appears to restrict you to NextJS. It would be nice if you could give us framework agnostic blocks and APIs that we can use with any technology. I wouldn't mind paying for it as long as it is reasonable.


Give streamlit a shot. Since I started using it I’ve been churning out internal tools at the pace of one every few days.


For employees, proving their capabilities and skills are always challenging. From the recruiter's perspective, identifying the candidates competency is becoming increasingly challenging. All we are able to do currently is use an age-old resume and many rounds of interviews.

We created getCREDIBLE using AI tech to create a revolutionary product that effectively tracks an individual's performance over time through 360° feedback without any bias or solicitation.

If you're wondering how we did it, check out the website and try it out! We're rapidly building new features, especially for recruiters that will improve their hiring process.


This is an angular app created by a high school student, to create a group and post high quality content. It features user roles and permissions to organize content.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: