Ask HN: Is anybody getting value from AI Agents? How so?

JCM9 · 2024-03-31T17:48:44 1711907324

I’ve seen a lot of attempts but nothing that worked really well. Using an agent as a glorified search engine can work, but trying to replace actual humans to handle anything but the most standard use cases is still incredibly hard. There’s a lot of overhyped rhetoric at the moment around this tech, and looks like we’re heading into another period of post-hype disillusionment.

Legal angles here also also super interesting. There’s a growing body of scenarios where companies are held accountable for the goofs of their AI “assistants.” Thus we’re likely heading for some comical train wrecks as companies that don’t properly vet this stuff set themselves up for some expensive disasters (eg think the AI assistant doing things that will get the company into trouble).

I’m bullish on the tech, but bearish on the ability of folks to deploy it at scale without making a big expensive mess.

spxneo · 2024-03-31T18:11:24 1711908684

Extrapolating on your legal comment, copyright is the current issue.

Right now the USCO says if you use a large model that used copyrighted material to train it, you cannot copyright the generated art. This applies to not just art but might be for code as well. So I wonder what the legal liabilities are say for a publicly traded company to use Midjourney to generate content that is also copyrighted.

ex) it is possible to generate movie characters that we instantly recognize if you use non-english words which pretty much nail in the coffin for safe harbour status granted under DMCA

dragonwriter · 2024-03-31T18:22:05 1711909325

> Right now the USCO says if you use a large model that used copyrighted material to train it, you cannot copyright the generated art.

Unless the Copyright Office has come out with a newer rule, it says if you use a model that does the usual creative parts of making the output you can’t copyright the output, because it is not a work of human authorship. (And that if a work is mixed AI/human, the AI use must be disclosed in copyright registration to avoid copyright protection being applied to elements that are not human work.) “Large model” and “copyrighted material used to train it” are not the issues.

jpollock · 2024-03-31T20:46:03 1711917963

The GP might not be exclusively talking about copyright. They might also be talking about how Air Canada lost a lawsuit about whether or not they needed to honor their chatbot's hallucinations.

https://www.theguardian.com/world/2024/feb/16/air-canada-cha...

stillTwerkin · 2024-03-31T18:31:46 1711909906

Eh, who cares? I get money from my job and can save money entertaining myself with generative art. Even if I can’t copyright it, fine, because it’s an outdated political and economic cudgel used to expropriate and exploit more often than not

The past is so often proven wrong and was so much less civil, we can never actually ask James Madison if we did right by his notion the future owes the past. We can only decide for ourselves we align with the long dead’s philosophy.

Where is the value in so carefully conserving in such detail the past? I’m not saying we end history as a field of study but as an obligation of daily life.

The most important lesson of history, imo, is prevent nation states or whatever might take their place from having the ability to wage wars. Most of their philosophy is indecipherable given our life experience is vastly different than theirs. Would they see this world and come to the same conclusion they did?

echelon · 2024-03-31T18:15:53 1711908953

AI agents suck. It's too early.

The first dominoes to fall will be art, music, and film. These don't require perfection and can be iterated on by the creator. The tools become a method in the process.

Everyone in the agent space is bumping around until the next big innovation that actually unlocks the technique sparks a race to productize. Maybe some will get lucky and have a fast pivot. Or maybe they'll run out of funds before the big breakthrough.

ganzuul · 2024-03-31T19:07:22 1711912042

Talented artists see their creative output drown under a torrent of commercial and mass-produced art. There are dominoes to fell but we should not be reckless about it.

With postmodernism art deconstructed its duty and without duty society does not grant rights. It is a big problem because art is often visionary about the future. We don’t know that we even have a future without vision.

echelon · 2024-03-31T20:13:03 1711915983

> Talented artists see their creative output drown under a torrent of commercial and mass-produced art.

There isn't enough art in the world. I've spent the last two decades making films the photons-on-expensive-glass method, and it's a pain in the ass.

I can start new software projects of scale easily. I can't do that with art. It's capital intensive, logistics intensive, and requires too many people in low-autonomy roles.

It really sucks the fun out of storytelling when you're chasing down location rights, showing up at the prop house at 7 AM, arranging for catering, etc.

> We don’t know that we even have a future without vision.

As a creator, I've dreamed about a better way before GenAI was even on the radar. I'm not the only one. Existing processes suck and too much of the work isn't creative at all.

As a consumer, my needs are barely being met at all. I want a show about steampunk vampires in space. I want a biopic film exploring Reimann, but told as a musical. I have creative notes for Benioff and Weiss, and I'd rather put those together for myself and my friends than echo words into the void.

I want so much more than the canned limited selection I have available on Netflix and Criterion. It barely whets the palate, and my appetite is completely unsated. The closest I've ever gotten is performing in improv theater and exploring the worlds I want with the people I feel comfortable creating around. But that's only part of the experience I want. I want so much more. The art we have today is a pale shadow of the mind's unlimited canvas. A projection onto a caveman's wall.

You can protect a vision of a priestly class of artists from the printing press all you want. I'm tired of living in the dark ages when we're sitting at the precipice of so much more. Their jobs won't go away any more than wedding and event photographers suffer in the advent of digital film.

If anything, the artists will be the best primed to take advantage of the new alpha. Free of studio meddling, they can build their own audiences that they own without the chains of brand guidelines. Vivziepop, but a million fold.

I saw a figure recently that said 80% of consumer film, games, and media originates in the US. Think about the rest of the world. So much culture and perspective that we should all share in remains unseen, and we're left with the lens of US media giants.

Think, too, of all the dreamers lost in opportunity cost. There are a billion stories that die silently in the minds of dreamers because we couldn't help them. And the world is all the worse for it.

ganzuul · 2024-04-01T07:01:43 1711954903

I very much agree on but one crucial point. In addition to individualist fulfillment I crave collectivist fulfillment and for that to happen we have to assign duties to art.

If a collective shoulders that duty then we do get a printing press priesthood. If individualists shoulder the duty to the collective then we should get something different. Perhaps productions like Helluva Boss do have some kind of post-deconstructionist duty of their own design. If that is the case then society should reciprocate with granting rights different from what the priesthood would accrue.

roncesvalles · 2024-04-01T15:10:11 1711984211

A major issue with generated art is precision - you never get perfectly repeatable details.

Let's say if you try to make a Family Guy episode using generated video, the characters would come out slightly different in every scene. You could generate 100 outputs for every scene and try to pick the best one but it still won't be very good.

nonrandomstring · 2024-03-31T18:44:23 1711910663

> AI agents suck.

The idea does not suck. On paper it's great. Just tell an "agent" what you want and off it goes to get it. And it's possible. LLMs open the interface to search planning and selection algorithms that have been around for 40 years and are mature. You could have this tomorrow.

The assumption is that people want it.

Tech business has come a long way exploiting people's vices, specifically laziness of thought we call "convenience". But at heart, tech is still seen as a tool, to empower people, to give them agency.

Agents subtract agency.

> It's too early.... the big breakthrough.

Hoping for progress against human psychology seems a fool's errand.

danielmarkbruce · 2024-03-31T18:51:56 1711911116

This is wrong. You can't have this tomorrow. The LLMs make too many errors right now for most use cases. If you think it's possible right now, you haven't tried to build it.

spxneo · 2024-03-31T18:21:40 1711909300

right now the expected future cashflow from whoever "wins" is infinite justifying astronomical amount of capital expenditure. ex) Microsoft's $100 billion supercomputer

Sometimes I get they are already sitting on some ground breaking stuff and slowly releasing it to test our responses and get feedbacks.

I'm certain they will be reading this thread but if one theme repeats itself is the lack of trust in the output of the agents and the companies creating it.

echelon · 2024-03-31T20:05:34 1711915534

> right now the expected future cashflow from whoever "wins" is infinite justifying astronomical amount of capital expenditure. ex) Microsoft's $100 billion supercomputer

Absolutely. These agent startups don't have PMF and haven't solved anything yet. They're playing in the kiddie pool while Microsoft is placing chess pieces with a GDP-level moat (which is frankly terrifying).

tuckerconnelly · 2024-03-31T18:03:20 1711908200

I built an AI-agents tech demo[1], and am now pivoting. A few thoughts:

* I was able to make a simple AI agent that could control my Spotify account, and make playlists based on its world knowledge (rather than Spotify recommendation algos), which was really cool. I used it pretty frequently to guide Spotify into my music tastes, and would say I got value out of it.

* GPT-4 worked quite well actually, GPT-3.5 worked maybe 80% of the time. Mixtral did not work at all, aside from needing hacks/workarounds to get function-calling working in the first place.

* It was very slow and VERY expensive. Needing CoT was a limitation. Could easily rack up $30/day just testing it.

My overall takeaway: it's too early: too expensive, too slow, too unreliable. Unless you somehow have a breakthrough with a custom model.

From the marketing side, people just don't "get it." I've since niched down, and it's very, very promising from a business perspective.

[1] https://konos.ai

andy99 · 2024-03-31T18:18:34 1711909114

I think it's destined to fail because it basically moved AI back into the "rules based" realm. Deep learning is a decent cognitive interface - like making a guess at some structure out of non-structure. That's where the magic happens. But when you take that and start using rules to chain it together, you're basically back to the same idea as parsing semi-structured data with regex and/or if statements. You can get it to work a bit but edge cases keep coming along that kill you, and your rules will never keep up. For simple cognitive tasks, deep learning figures out enough of the edge cases to work pretty well, but that's gone once you start making rules for how to combine predictions.

thekumar · 2024-03-31T20:48:10 1711918090

I totally agree with this. I have been arguing with folks that current Reactflow based agent workflow tools are destined to fail, and more importantly, missing the point. Stop forcing AI into structured work.

I do think AI "agents" (or blocks as I like to think of them) unlock the potential for solving unstructured but well-scoped tasks. But it is a block of unstructured work that is very unique to a problem, and you are very likely to not find another problem where that block fits. So, trying to productize these AI blocks as re-usable agents is not that great of a value prop. And building a node based workflow tool is even less of a value prop.

However, if you can flip it inside out and build an AI agent that takes a question and outputs a node based workflow. But the blocks in the workflow are structured pre-defined blocks with deterministic inputs and outputs, or a custom AI block that you yourself built, then that is something I can find value in. This is almost like the function calling capabilities of GPT.

Building these block reminds me of the early days of cloud computing. Back then the patterns for high availability were not well-established and people that were sold on the scalability aspects of cloud computing and got onboard without accounting for service failure/availability scenarios and the ephemeral nature of EC2 instances were left burned, complaining about the unfeasibility of cloud computing.

noway421 · 2024-04-01T11:04:07 1711969447

> AI agent that takes a question and outputs a node based workflow

That rings useful to me. I find it hard to trust an AI black box to output a good result, especially chained in a sequence of blocks. They may accumulate error.

But AIs are great recommender systems. If it can output a sequence of blocks that are fully deterministic, I can run the sequence once, see it outputs a good result and trust it to output a good result in the future given I more or less understand what each individual box does. There may still be edge cases, and maybe the AI can also suggest when the workflow breaks, but at least I know it outputs the same result given the same input.

spxneo · 2024-03-31T18:12:40 1711908760

what makes it slow? is it because they throttle your api key?

tuckerconnelly · 2024-03-31T19:33:00 1711913580

Chain of thought takes time to generate all the characters. If you do a chain-of-thought for every action and every misstep (and you need to for quality + reliability), it adds up.

spxneo · 2024-03-31T19:45:50 1711914350

Is there no way to share that "memory" across chats?

or are we at the mercy of hosted models?

ShamelessC · 2024-04-01T00:48:30 1711932510

There’s caching but only so much can be cached when small changes in the input can lead to an entirely different space of outputs. Furthermore, even with caching LLM inference can take anywhere from 1-15s using GPT4-Turbo via the API. As was mentioned, the more characters you prefix in the context - the longer this takes. Similarly you have a variable length output from model (up to a fixed context length) and so the time it takes to calculate the “answer” can also take awhile. In particular with CoT you are basically forcing the model to use more characters than it otherwise would (in its answer) by asking it to explain itself in a verbose step by step manner.

itake · 2024-03-31T18:18:57 1711909137

Our p99 for gpt4 is 3s. Images take up to 50s.

spxneo · 2024-03-31T18:22:46 1711909366

so how would you go about improving that?

freediver · 2024-03-31T23:39:01 1711928341

Not using an LLM for it.

itake · 2024-04-01T01:40:04 1711935604

we only send 0.5-5% of traffic to gpt4, thanks to smaller faster cheaper models. So not all of our traffic is hit with 50s latencies :-/

1oooqooq · 2024-03-31T20:10:06 1711915806

so, no?

tuckerconnelly · 2024-03-31T23:43:59 1711928639

There's value, but it's too expensive, too slow, and too unreliable right now to be feasible from a business perspective.

kennethologist · 2024-03-31T17:52:03 1711907523

Apparently Pieter Levels: " Interior AI now has >99% profit margins

- GPU bill is $200/month for 21,000 designs per month or about 1¢ per render (no character training like Photo AI helps costs) - Hosted on a shared VPS with my other sites @ $500/mo, but % wise Interior AI is ~$50 of that

+= $250/month in costs

It makes about $45,000 in MRR and so $44,730 is pure profits! It is 100% ran by AI robots, no people

I lead the robots and do product dev but only when necessary"

https://twitter.com/levelsio/status/1773443837320380759

paxys · 2024-03-31T18:27:46 1711909666

This guy makes money by selling "how to get rich using AI" courses and marketing himself on social media (which he is phenomenal at). I'm not really inclined to believe his sales numbers.

Mystery-Machine · 2024-03-31T18:49:01 1711910941

He is NOT selling any courses. Can you please point me to any of his courses? He has a book he wrote about making software/projects called Make. This book is several years old and doesn't mention AI.

conjectures · 2024-03-31T18:08:22 1711908502

Gent is shilling his book about passive income or whatever. Sure I believe his numbers.

balls187 · 2024-03-31T18:15:54 1711908954

> $45,000 in MRR and so $44,730

I’ve found a lot of these numbers from people selling passive income methods are extrapolated.

jokethrowaway · 2024-03-31T18:21:09 1711909269

Levels has been one of the most open entrepreneurs out there. I'd be surprised if he lied on revenue

All of that just to sell a book?

solumunus · 2024-03-31T18:46:32 1711910792

Entrepreneur influencers are some of the worst trash.

ShamelessC · 2024-03-31T17:58:40 1711907920

Is that an agent based setup? Seems like it’s using a few different models wired together manually.

iAkashPaul · 2024-03-31T18:09:43 1711908583

His robots are just automated scripts which do the bulk repetitive tasks unlike the agents that comment OP thought.

kylecazar · 2024-03-31T17:59:17 1711907957

And yet "Unfortunately, we cannot offer refunds as costs incurred for generating AI images are extremely high."

dvfjsdhgfv · 2024-03-31T18:06:49 1711908409

Yeah, he claims because at any point in time there are people redesigning their interiors. I'd say that at any point in time there are people you can convince to give you their money, and if you don't offer refunds, it's not far from a scam.

yawnxyz · 2024-04-01T04:11:02 1711944662

most other services (Stripe, ChatGPT, Google Workspace) also don't seem to offer refunds?

And neither do most restaurants either; what's to prevent someone from getting a service and then a full refund?

naijaboiler · 2024-03-31T18:57:56 1711911476

exactly 99% gross profit, but can't offer refunds due to cost

GaryNumanVevo · 2024-03-31T22:22:46 1711923766

I roll my eyes every time I see a tweet of his. "Entrepreneur influencers" are uniquely unbearable.

airstrike · 2024-03-31T17:57:04 1711907824

I get a lot of value out of Copilot and GPT4 for coding, but that's about it.

It's true that have to wrestle a lot with them to get them to do what I want for more complex tasks... so they are great for certain tasks and terrible for others, but when I'm in Xcode, I dearly miss vscode because of Copilot autocomplete, which I guess is an indication that it adds some value

One unexpected synergy has been how good GPT4 is at explaining why my rust code is so bad, thanks to the very verbose compiler messages and availability of high quality training data (i.e. the great rust code in the wild)—despite GPT4 not always being great at writing new rust code from a blank file.

Part of me thinks in the future this loop is going to be a bit more automated, with an LLM in the mix... similar to how LSPs are "obvious" and ubiquitous these days

On an unrelated note, I also wrote a small python script for translating my Xcode project's localizable strings into ~10 different languages with some carefully constructed instructions and error checking (basically some simple JSON validation before OpenAI offered JSON as a response type). I only speak ~2 of the target languages, and only 1 natively, but from a quick review the translations seemed mostly fine. Definitely a solid starting point

ShamelessC · 2024-03-31T17:59:45 1711907985

That’s not an agent based setup.

varunshenoy · 2024-04-01T03:07:50 1711940870

I've been playing with AI agents for months, and most of them are pretty bad. They often get stuck in loops, which is frustrating. This happens in MultiOn, AutoGPT, and others.

I've used Devin a few times (see: https://x.com/varunshenoy\_/status/1767591341289250961?s=20), and while it's far from perfect, it's by far the best I've seen. It doesn't get stuck in loops, and it keeps trying new things until it succeeds. Devin feels like a fairly competent high school intern.

Interestingly, Devin seems better suited as an entry-level analyst than a software engineer. We've been using it internally to scrape and structure real estate listings. Their stack for web RPA and browser automation works _really_ well. And it makes sense why this is important: if you want to have a successful agent, you need to provide it with good tools. Again, it's not flawless, but it gives me hope for the future of AI agents.

neilv · 2024-03-31T18:23:24 1711909404

Most of the application right now is for purposes for which quality isn't a high priority. (Also, plagiarism laundering.)

Don't put it in charge of paying bills.

Do put it in charge of making SEO content sites, conducting mass scam automated interactions, generating bulk code where company tolerates incompetence, making stock art for blog posts that don't need to look professional, handling customer service for accounts you don't care about, etc.

trzy · 2024-03-31T17:51:23 1711907483

Aren’t agents bottlenecked by the underlying models? I’ve read that the number of “chain of thought” steps needed is proportional to task complexity. And if each step has the same probability p of success, probability of success is p^n, where n is the number of steps needed (potentially high). At a 99% success rate per step and 5 steps that’s a 95% overall success rate. 90% drops down to 60%. Not sure what the real numbers are but this seems like it could be a problem without significantly more intelligent ML models?

Filligree · 2024-03-31T17:54:40 1711907680

Error-checking and recovery is a potential solution here. Not a well understood one, and might still need higher intelligence than we've got, but-

If your math worked out, then humans couldn't work either.

pphysch · 2024-03-31T18:03:25 1711908205

If you have a "super-agent" AI that is capable of recovering a business process from an error state, why not just use that agent in the first place?

exe34 · 2024-03-31T18:56:33 1711911393

Chat gpt has often given me the right answer for code after seeing the error trace resulting from its previous attempt.

I also often correct my own mistakes based on clashes with reality - I don't just become more intelligent the second time.

devanandb · 2024-03-31T21:05:09 1711919109

I would argue that you are! You will not try to clash with reality the same way you did before, provided you “remember” and I believe future agents/models will have this kind of contextual memory continuously being getting baked in to improve..just a thought.

exe34 · 2024-03-31T22:09:35 1711922975

I think you could do this with an open model with overnight tuning on the day's errors. Probably very expensive though. Easier to scoop up all the errors on the internet on the first round of pre-training.

devanandb · 2024-03-31T23:53:42 1711929222

Couldn’t agree more! That’s why also maybe they are raising 100 more billions!..:p

lukasb · 2024-03-31T18:11:26 1711908686

You don't need a super agent, you just need two LLM-based systems with errors that aren't too correlated.

pphysch · 2024-04-01T18:26:51 1711996011

How do you "just" accurately evaluate the error state space of an LLM relative to a real business process? Sounds approximately impossible to me.

If you already have the business process robustly defined as code, then the utility of LLM is unclear. The value prop of LLM is in fuzzy business processes like parsing arbitrary helpdesk tickets.

lukasb · 2024-04-01T23:32:34 1712014354

You evaluate it the way we've evaluated production ML for years, with cheap QC layers sampled and checked by more expensive layers (with humans on top.)

LLMs didn't invent stochastic process steps.

babelfish · 2024-04-01T01:06:22 1711933582

Are there any papers on this? I’d be interested in reading more.

JohnKemeny · 2024-04-08T07:23:00 1712560980

There's a paper by Papadimitriou (from Logicomix fame) and some collaborators that the transformer model is incapable of solving certain simple problems, and if done by Cost, it needs exponentially many steps.

The paper is currently only available at Arxiv (ie not yet peer-reviewed), but given that it is Papadimitriou, I would be inclined to believe the results.

https://arxiv.org/abs/2402.08164

spxneo · 2024-03-31T18:18:48 1711909128

50 comments so far with 4 about non-agent codegen and rest confirming OPs observations.

I'm seeing also an explosion in the number of comments advertising their AI tool on anything remotely related to AI topics. Makes me think we are headed for a major correction.

abathur · 2024-03-31T18:46:43 1711910803

Since your definition seems to be a sharp enough razor to triage 50 comments and I'm not read-in enough to tell from the OP, can you help elucidate? :)

Are we talking about work on so-called intelligent agents (something with a primitive OODA loop?), a specific ~pattern of conversational AI chatbot, or something else?

ShamelessC · 2024-04-01T00:56:45 1711933005

AutoGPT and the like are essentially what OP was referring to. Basically “Siri, but it works and can actually do most of what you ask because the underlying language model is robust rather than a rules engine” (and friends).

spxneo · 2024-03-31T20:00:38 1711915238

AI startups are funded by VCs funded by LPs long on Nvidia while simultaneously short on the foudners.

msanlop · 2024-04-03T13:38:14 1712151494

Do you have examples of this?

nottorp · 2024-03-31T18:06:13 1711908373

What's an agent based workflow? :)

I use LLMs as a glorified search engine. That was better than web search at some point, I'm not sure the publicly available LLMs are that good any more. Gemini seems to be extremely worried to not offend anyone instead of giving me results lately.

At least it's still useful for 'give me the template code for starting an XXX' ...

morkalork · 2024-03-31T18:13:49 1711908829

Is it something like this?

>You are a <blank assistant>, the user has requested <input>. Here are a list of possible actions and their descriptions. Choose the most appropriate action for the user's request. Parse the following from the request...

In a loop, and you execute whichever action is selected after?

danenania · 2024-03-31T18:28:20 1711909700

I'm working on an agent-based tool for software development. I'm getting quite a lot of value out of it. The intention is to minimize copy-pasting and work on complex, multi-file features that are too large for ChatGPT, Copilot, and other AI development tools I've tried.

https://github.com/plandex-ai/plandex

It's working quite well though I am still ironing out some kinks (PRs welcome btw).

I think the key to agents that really work is understanding the limitations of the models and working around them rather than trying to do everything with the LLM.

In the context of software development, imo we are currently at the stage of developer-AI symbiosis and probably will be for some time. We aren't yet at the stage where it makes sense to try to get an agent to code and debug complex tasks end-to-end. Trying to do this is a recipe for burning lots of tokens and spending more time and than it would take to build the thing yourself. But if you follow the 80/20 rule and get the AI to do the bulk of the work, intervening frequently to keep it on track and then polishing up the final product manually at the end, huge productivity gains are definitely in reach.

jacob019 · 2024-03-31T19:12:19 1711912339

When I hear AI agents, I hear RL (reinforcement learning), not LLMs. RL may not be having the moment that LLMs are, but the progress in recent years is incredible and they are absolutely solving real world problems. I was just listening to a podcast about using an RL algorithm to enhance the plasma containment system in a fusion reactor, and the results were incredible. It quickly learned a policy that was competitive with the existing system that had been hand built over many years at a cost of millions. It even provided some new insights and surprises. RL is SOTA in robotics control, and some new algorithms like Dreamer V3 can generalize in realtime without millions of samples. It's has already grown way beyond solving ATARI games and is in many cases being used to train LLMs and other generative AI.

There is a good amount of research going into combining LLMs with RL for decision making, it is a powerful combination. LLMs help with high level reasoning and goal setting, and of course provide a smooth interface for interacting with humans and with other agents. LLMs also contain much of the collective knowledge of humanity, which is very useful for training agents to do things. If you want to train a robot to make a sandwich it's helpful to know things, like what is a sandwich, and that it is necessary to move, get bread, etc.

These feedback loop LLM agent projects are kind of misguided IMO. AI agents are real and useful and progressing fast, but we need to combine more tools than just LLM to build effective systems.

Personally, I am using LLMs quite effectively for ecommerce: classifying messages, drafting responses, handling simple requests like order cancellation. All kinds of glue stuff that used to be painful is now automated and easy. I could go on.

wkirby · 2024-03-31T17:58:54 1711907934

The best quote I’ve heard from our clients is “don’t trust AI with anything you wouldn’t trust a high schooler to do.”

That line of reasoning has held true across basically every project we’ve touched that tried to incorporate LLMs into a core workflow.

drewcoo · 2024-03-31T18:06:01 1711908361

>> “don’t trust AI with anything you wouldn’t trust a high schooler to do.”

Then they should be great for making fast food, staffing amusement parks, and seasonal farm labor.

They don't seem to be good for those things either.

[Edit to add: the value high schoolers bring to jobs through non-cognitive abilities, which AIs lack.]

clevergadget · 2024-03-31T18:20:03 1711909203

to be fair neither are high schoolers

hubraumhugo · 2024-03-31T18:03:25 1711908205

The term "AI agents" might be a bit overhyped. We're using AI agents for the orchestration of our fully automated web scrapers. But instead of trying to have one large general purpose agent that is hard to control and test, we use many smaller agents that basically just pick the right strategy for a specific sub-task in our workflows. In our case, an agent is a medium-sized LLM prompt that has a) context and b) a set of functions available to call. For example we use it for:

- Navigation: Detect navigation elements and handle actions like pagination or infinite scroll automatically.

- Network Analysis: Identify desired data within network calls.

- Data transformation: Clean and map the data into the desired format. Finetuned small and performant LLMs are great at this task with a high reliability.

The main challenge:

We quickly realized that doing this for a few data sources with low complexity is one thing, doing it for thousands of websites in a reliable, scalable, and cost-efficient way is a whole different beast.

The integration of tightly constrained agents with traditional engineering methods effectively solved this issue for us.

erichi · 2024-03-31T18:22:58 1711909378

Why using llm to chose a proxy if you can just rotate starting from the cheapest based on not getting 403?

edshiro · 2024-04-07T06:32:21 1712471541

I get the same feeling. AI Agents sounds very cool but reliability is a huge issue right now.

The fact that you can get vastly different outcomes for similar runs (even while using Claude 3 Opus with tool/function calling) can drive you insane. I read somewhere down in this thread that one way to mitigate these problems is my implementing a robust state machine. I reckon this can help, but I also believe that somehow leveraging memory from previous runs could be useful too. It's not fully clear in my mind how to go about doing this.

I'm still very excited about the space though. It's a great place to be and I love the energy but also measured enthusiasm from everyone who is trying to push the boundaries of what is possible with agents.

I'm currently also tinkering with my own Python AI Agent library to further my understanding of how they work: https://github.com/kenshiro-o/nagato-ai . I don't expect it to become the standard but it's good fun and a great learning opportunity for me :).

habitue · 2024-03-31T20:23:26 1711916606

To summarize, agents are (essentially) LLMs in a loop: take actions, think, plan, etc, then repeat.

Currently, from what I've seen, current LLMs "diverge" when put into a loop. They seem to reason acceptably in small chunks, but when you string the chunks together, they go off the rails and don't recover.

Can you slap another layer of LLM on top to explicitly recover? People have tried this, it seems like nobody has figured out the error correction needed to get it to converge well.

My personal opinion is that this is the measure of whether we have AGI or not. When LLM-in-a-loop converges, self-corrects, etc, then we're there.

It's likely all current agent code out there is just fine, and when you plug in a smart enough LLM it'll just work.

arretevad · 2024-03-31T18:43:32 1711910612

Whenever I read about how AI is going to automate art or some other creative job I think about a quote I read (maybe here) that went something like “that which is made without effort is enjoyed without pleasure”.

dredds · 2024-03-31T20:46:15 1711917975

Or likening addon AI capabilities to a supervillain's anti-power: "When everybody is super (artistic, whatever), then no one will be!" - Syndrome, Incredibles.

nunodonato · 2024-03-31T18:25:00 1711909500

Not sure if this would qualify has an "agent", but I developed my own AI personal assistant that runs as a telegram bot. I can use it from everywhere easily, handles my events, reminders, sends me a daily agenda and memorizes useful things for me. I even integrated it with whisper so that I can send a telegram voice message and don't need to write. From a product/selling perspective, no value at all since I haven't even considered that (I'm building it for myself and my needs). But daily usefulness value? heck yeah!

bronco21016 · 2024-03-31T19:18:48 1711912728

I’ve done something similar with iOS Shortcuts, giving the LLM access to my device calendar and reminders. I’m not sure it fits the definition of “agent” but it’s definitely useful being able to query my calendar using natural language rather than Siri’s rigid commands.

afro88 · 2024-03-31T19:42:57 1711914177

Are you using S-GPT or something like that?

bronco21016 · 2024-03-31T20:49:28 1711918168

I customized Siri COPILOT which I believe I found on HN. I can’t find a link to it anymore. I customized it with my own prompts and my own “tools” which are just Shortcuts that retrieve data from the native apps.

I’m currently reworking the concept with Scriptable as I find it so frustrating to develop anything in Shortcuts.

suchintan · 2024-03-31T17:53:44 1711907624

I don't think ai agents are good enough to replace every job today, but they're starting to nip at the more junior / menial knowledge jobs

I've seen a lot of success come from AI sales agents, just doing basic SDR style work

We're having some success automating manual workflows for companies at Skyvern, but we've only begun to scratch the surface.

I suspect that this will play out a lot like the iPhone era -- first few years will be a lot of discovery and iteration, then things will kick into superdrive and you'll see major shifts in user behavior

andy99 · 2024-03-31T18:51:58 1711911118

> junior / menial knowledge jobs

Not junior, unthinking. It's like outsourcing/offshoring. You're getting the equivalent of someone who doesn't care about what they're doing, not somebody inexperienced. I'm not saying it doesn't have a place, just commenting on the framing.

JohnKemeny · 2024-03-31T18:04:40 1711908280

> I don't think ai agents are good enough to replace every job today

You mean any.

You can fire a human knowledge worker for not doing their job correctly, but what are you gonna do when you only have LLMs and realize they can't do their job correctly?

suchintan · 2024-03-31T18:10:48 1711908648

I think there are some they're good enough at today. Auto generating meeting notes + AI context, auto responding / following up to emails, filling out forms (we do that pretty well at Skyvern with high accuracy)

lpapez · 2024-03-31T18:30:43 1711909843

Fire 8 out of 10 of your knowledge workers and have the other remaining 2 review and fix LLM output.

Basically the same what we have now, except the grunt workers will be replaced by the machine.

digitcatphd · 2024-03-31T18:53:43 1711911223

Devin seems like the first able to be commercialized. In my opinion the only way to do it well right now is you need to build your own system, the out of box open source projects are just some foundational work.

I actually don’t think we will need agents in the future, I think one model will be able to morph itself or just delegate copies of itself like MoE for actions.

It just seems extremely unlikely to me foundation models don’t get exponentially smarter over the next few years and can’t do this.

geor9e · 2024-03-31T21:47:19 1711921639

Lots more good answers in [Ask HN: What have you built with LLMs? | Hacker News](https://news.ycombinator.com/item?id=39263664) too

If most people's only experience with AI is the chat.openai.com interface then yeah I can see why it seems like too much hassle to most people. The trick is figure out your long prompts ahead of time, and hardcode each one into a HTTP Request in something else (Tasker, BetterTouchTools, Alfred, Apple Shortcuts, etc). For me, I have dozens of long prompts to do exactly what I want, assigned to wakewords, hotkeys, and trigger words on my mac/watch/phone. Another key thing is I use FAST models, i.e. Groq not GPT-4. Latency makes AI too much hassle. i.e. 1. Instant (<1 second end-to-end) answers in just a few words, to voice questions spoken into my watch any time I hold the side button 2. Summarize long articles and youtube videos before I decide to spend time on them 3. Add quick code snippets in plain english with hotkeys or voice 4. Get the main arguments for and against something just to frame it … stuff like that. If it would make your life easier for an AI to save you 1 second per task, why not.

juliob · 2024-04-01T00:13:23 1711930403

Interesting. Do you have more technical details on how to build these same tools for myself?

geor9e · 2024-04-02T01:32:51 1712021571

All the big LLM APIs work via similar HTTP Requests - you can do them with curl, python, Tasker, Apple Shortcuts, anything that can do a HTTP POST.

Start from a playground/workbench page like https://console.groq.com/playground (you may need to sign up with a credit card, I forget, but it's pennies per month). Under system, tell it how to act - put "You talk in a cockney accent". Under user just say "hi". Press submit to see how it responds.

Click the "View code" button below to see all the HTTP requests parameters in different flavors. curl is probably easiest since you just paste it into terminal on mac. Just set the API key variable first. You get a key from the API keys page.

Once you start getting responses that make sense in terminal try it in https://reqbin.com/post-online . Figure out where the parameters go. Then try it again in Apple Shortcuts - it will look like this https://imgur.com/a/xqqhJh6.png . On mac some tools that can trigger apple shortcuts via hotkey are BetterTouchTool or from the "spotlight bar" Alfred Workflows works well, but there's probably all sorts of ways. On Android I use Tasker which is similar to Apple Shortcuts. For voice recognition I use Taskers in-built Google voice recognition. On the mac I am using OpenAI Whisper via HTTP POST. It's all duct tape and bubblegum behind the scenes - just start small.

bsenftner · 2024-03-31T18:05:59 1711908359

I've found that while agents cannot replace anyone, they can sure help with the use of various things.

First, we know these AIs are trained with data from the general Internet, and that data is vast.

Second, the general Internet contains owner manuals and support forums for practically every active product there is, globally. These are every possible product too: physical products, virtual products like software or music, and experience products like travel or education. Between the owner’s manuals and the support forums for these products there is extremely deep knowledge about the purpose, use and troubleshooting of these products.

Third, one cannot just ask an LLM direct deep questions about some random product and expect a deep knowledge answer. One has to first create the context within the LLM that activates the area(s) of deep knowledge you want your answers to arise. This requires the use of long form prompts that create the expert you want, and once that expert is active in the LLM’s context, then you ask it questions and receive the deep knowledge answers desired.

Fourth, one can create an LLM agent that helps a person create the LLM agent they want, the LLM agent can help generate new agents, and dependency chains between different agents are not difficult at all, including information exchange between groups of agents collaborating on shared replies to requests.

And last, all that deep information about using pretty much every software there is can be tapped with careful prompting to create the context of an expert user of that software, and experts such as these can become plugins and drivers for that software. It's at our finger tips...!

jokethrowaway · 2024-03-31T18:18:52 1711909132

It's a bit like with the chatbots revolution from 8-10 years ago (not LLM, just make a choice and maybe parse a few keywords to navigate a chatbot state machine)

Sure, we can do that, but do users want that?

I don't want to chat, talk or interact with people, I want the most efficient ui possible for the task at hand. When I do chat with someone is because some businesses are crap at automating and I need a human to fix something. Even then I don't want a robot that can't do anything.

The only exception I can think of is tutoring but then I'd really question the validity of the answers. RAG is pretty cool in that regard because it can point at the original paragraph being used to answer the question.

That might be useful to someone but that's not my favourite way of learning.

Give me a summary of the content, give me the content, Ctrl+F and I'm good to go.

For low stakes things like gaming where the agent messing up would just be a fun bug, I think it can be great.

Looking forward to automatically generated side quests based on actions and npc which get pissed if I put a box on their head and hire mercenaries if I murder their families.

mattew · 2024-03-31T17:45:32 1711907132

I’ve found the OpenAI assistants API not really up to snuff in terms of predictable behavior yet.

That said, I’m very bullish on agents overall though and expect that once they get their assistants behaving a bit more predictably we will see some cool stuff.

It’s really quite magical to see one of these think through how to solve a problem, use custom tools that you implement to help solve it, and come to a solution.

furyofantares · 2024-03-31T18:12:46 1711908766

Agents are largely an attempt take the Human-LLM pair and and use the LLM to replace all the work the human finds trivial but which the LLM is terrible at.

Trying to get more inference value per-prompt is a good thing. Starting by trying to get it to do long-chain tasks per-prompt makes no sense.

I'm a huge fan of LLMs for productivity, but even small tasks often require multiple prompts of build-up or fix-up. We should work toward getting those done in a single prompt more often, then work toward slightly larger tasks etc.

Plugins and GPTs are both attempts at getting more/better inference per-prompt. There is some progress there, but it's pretty limited. There's also plenty of people building task-specific tools that get better results than someone using the chat interface due a lot of prompt work.

So there is incremental progress happening, but it's been fairly slow. The fact that it's this much work to get incrementally more inference value per prompt makes it very hard to imagine anyone closing the whole loop immediately with an agent.

harrisoned · 2024-03-31T18:11:21 1711908681

Just like many here have said already, GPT4 is being useful for coding for me. It is an amazing parser, specially, and save me precious time. Of course it's not able to do anything on it's own or without supervision, but is has been better than looking up to examples on Google.

I also have been experimenting with it to replace the intention classifier part of Google's dialogflow. We use it at work for our chatbot. Earlier, we used Watson and it was amazing, but became very expensive. Dialogflow is cheap, but it is as innacurate with complex natural language as it is cheap.

Mixtral (8x7B) has proved extremely accurate in identifying intentions with a consistent JSON output, giving it a short context, so i assume a simple 7B model would do the job. I still don't know if it is financially worth it, but it's something i'm gonna try if i can't fix the dialogflow's intentions. But in no way the model's output would directly interface with a client. That's asking for trouble.

anotherpaulg · 2024-03-31T19:23:09 1711912989

I made a specific design decision to avoid and minimize agentic behavior in aider. My biggest concerns are that agentic loops are slow and expensive. Even worse, they often "go off the rails" and spend a lot of time and money diligently doing the wrong thing, which you ultimately have to undo/redo.

Instead, I've found the "pair programming chat" UX to be the most effective for AI coding. With aider, you collaborate with the AI on coding, asking for a sequence of bite sized changes to your code base. At each step, you can offer direction or corrections if you're unhappy with how the AI interpreted your request. If you need to, you can also just jump in and make certain code changes yourself within your normal IDE/editor. Aider will notice your edits and make sure the AI is aware as your chat continues.

https://github.com/paul-gauthier/aider

KennyBlanken · 2024-03-31T18:40:03 1711910403

The better question is: is the "value" worth the enormous environmental cost from the massive amounts of power used in training datasets, storing them, and running the GPUs/TPUs?

NVIDIA estimates they'll ship up to 2M H100 GPUs. They have a TDP of about 300-400W each. Assume that because of their high cost, their utilization is very high. Assume another 2/3rds of that is used for cooling, which would be another 200W. Be generous and throw out all the overhead from the host computers, storage, power distribution, UPS systems, and networking.

2M * 600W = 1.2GW.

Let's say you only operated them during the daytime and wanted to do so from solar power. You'd need between ten and twenty square miles worth of solar panels to do so.

bobosha · 2024-04-01T11:09:18 1711969758

Have not experimented with it personally, but here is Andrew Ng's talk on the subject: https://www.youtube.com/watch?v=sal78ACtGTc

timacles · 2024-03-31T18:28:27 1711909707

I was hype on them initially, but after a few months of using them, I find that they are only useful for simple questions, and for coding help with syntax for basic things I have forgotten.

Anything more complex just turns into an irritating back and forth game that when I finally arrive at the solution, I feel like I wasted my time not getting practical experience, but rather gaming a magic 8 ball into giving me what I wanted.

It just doesn't feel satisfying to me to use them anymore. I don't deny that they improve my productivity, but its at the cost of enjoying what i do. I was never able to enter that feeling of zen flow, while using LLMs regularly.

keymon-o · 2024-03-31T19:52:44 1711914764

LLM tech is breaking records in popularity because it’s extremely user centric. It reminds us of a friendly teammate that seems seasoned in all areas we talk about with them. It’s a pleasure to talk to them now and then, we enjoy the vast source of information it provides us no matter what the subject is. But then, we ask him to do the work for us…

Or imagine having a senior engineer in a subject you are clueless about (e.g. Haskell) and think about how annoying would it be for both of you to communicate out some advanced functionality program.

Analyzing, researching, learning and making decisions are a crucial part of doing work. LLM apps are useful at another approach for learning and researching, but if you solely rely on them their drawbacks are going to keep you behind.

dudus · 2024-03-31T18:02:11 1711908131

Opinions mine based on learning from scratch this space in the last couple months only.

I feel like these architectures built on top of last gen LLMs are mostly useless now.

The current gen jump was significant enough that creating a complex chain of thought with RAG on last gen usually is surpassed by 0 shots on current gen.

So instead of spending time and money building it it's better to focus on 0-shot and update your models to the latest version.

Feeding LLM outputs into other LLM inputs IMHO just will increase the bias. Initially I expected to mix and match different models to avoid it but that didn't work as much as I expected.

It depends a lot on your application honestly.

spxneo · 2024-03-31T18:28:05 1711909685

but aren't current gen 0 shots gated and throttled? or has things changed for azure openai

wslh · 2024-03-31T17:49:29 1711907369

We are doing that internally, so I think it is now more a craft than a "product". For example, we look at lot of specific codebase repositories (e.g. GitHub) and try LLMs over the diff just before and after a security code audit was done.

Another one is listening to many social media (e.g. Twitter) posts to sense if there is a business opportunity. SDRs scan the results in an Slack channel manually but based on these signals.

Finally, this is now a workflow but we did this [1] that is a piece in our work.

[1] https://news.ycombinator.com/item?id=39280358

precompute · 2024-03-31T19:13:02 1711912382

Ah sweet we have finally entered the "let's be realistic" phase.

hhh · 2024-03-31T17:47:58 1711907278

I haven’t found any useful agent workflows, and I’ve not found a tool that’s more productive for me (doing arch/design/implementation of systems) than just copy and pasting from the Playground.

dave333 · 2024-03-31T18:51:15 1711911075

Seems like we are still in the sheepdog phase where AI agents are extremely capable but not really autonomous helpers that still need overall coordination and control. Logical extension is layers upon layers so the next stage is a shepherd AI and then a farm AI. Then a meta AI that can discern and separate the layers and implement each and combine. May develop like a film studio with area experts combined for a particular project rather than a static one structure fulfils all approach.

neural_thing · 2024-03-31T19:04:21 1711911861

Devin is the only one that I've been able to use. Set up some projects for me, added some features. Could improve, obviously, but net positive for me in terms of time

nijfranck · 2024-03-31T20:02:30 1711915350

I build a lot of side projects and I have gotten a lot of value from https://www.goagentic.com/ to send personalised cold emails at scale. I no longer need to spend time researching the prospects as the tool researches every prospect, crafts a personalised message based on what I am selling and send the emails. So far with a 2-5% positive reply rate.

warthog · 2024-03-31T18:51:16 1711911076

We use it for manual research. Think of the times when you visit a certain prospect's website or company website to detect a certain information or to find a hook to talk to them about.

We use agents in workflow to be able to do this in bulk. Problem is it does take a long time but at least it saves time at the end of the day and saves you from manually visiting a list of 100 different domains to see a piece of information

paradite · 2024-03-31T17:57:34 1711907854

Agents still very new and nothing that works for production yet.

Specifically on using AI for coding, I wrote about different levels of AI coding from L1 to L5, we are still at L2/L3 stage for mature and production ready tech. Agents are L4/L5:

https://prompt.16x.engineer/blog/ai-coding-l1-l5

stavros · 2024-03-31T18:02:22 1711908142

I just (as in, five minutes ago) hooked GPT 4 up to my 3D printer and it's fantastic, I use an ESP32 Box and I can ask it what files I have on my printer, I can ask it to print a file, I even added calendar integration so it can read me my events and add new ones. I love it.

All that's left is for someone to bundle it all up into a nice package, and we'll be in the future.

mechagodzilla · 2024-03-31T19:42:14 1711914134

I use a 3D printer all the time, but I almost never print a file more than once. It also takes a couple of seconds to select a file with the built-in interface, and then it usually takes anywhere from 30 minutes to a day or more to print. What's your usage model that putting a GPT4 instance between you and the printer is somehow helping things? This feels like someone saying "Now I can e-mail my kleenex box to see how much kleenex it has."

stavros · 2024-03-31T20:04:20 1711915460

I had some small prints that took ten minutes each, that I needed a lot of, so I figured it would be nice to talk to my ESP box and have it launch prints.

mechagodzilla · 2024-03-31T20:55:41 1711918541

Can you not plate multiple copies of it that print at the same time? Usually you need to remove the finished print, clean the build plate etc. between prints, so longer total print times between 'overhead' periods increases your productivity significantly. It seems like a neat demo to be able to say "Print another one" verbally after spending a few minutes physically doing things to re-prep the machine, but not actually more productive than pressing the 'print again' button that shows up on the menu on mine.

stavros · 2024-03-31T21:03:58 1711919038

I can, but I didn't know how many I'd need beforehand, and multiple copies increase the chance of failure (if one fails, the whole plate is trash).

erru · 2024-04-01T09:13:41 1711962821

We've built a low-code AI agent platform with primary use case in e-commerce (replacing first-line of humans for basic things like product search, QA, etc). It works fairly well if you assemble the script correctly. And if it fails - it just falls back to humans, so customers don't see much difference in their experience.

vood · 2024-03-31T18:08:28 1711908508

I do. I use assistants as containers for different conversations for my GTM work: An assistant for marketing and copywriting An assistant for customer support An assistant sales conversations.

These agents aren't super smart: just few PDFs for context plus a few sentences system prompts.

I do get what I want in 80% of use cases (not measured, just a feeling).

ed · 2024-03-31T18:10:09 1711908609

Agents and tooling around LLM’s can probably make some small number of applications viable, but for the most part we need better foundation models to deliver on the hype.

We’re definitely in the “wait” phase of the wait calculation. Everyone is expecting GPT5/q* to change things but really we won’t know until we see it.

aubanel · 2024-03-31T19:54:40 1711914880

I think agents are not a fad, they're here to stay since using an LLM in an agent system is the only way to let it access real world task, which they will end up doing when they're good and reliable enough.

That said, I believe the current best models are still not good enough - but let's wait a few months.

Its_Padar · 2024-03-31T17:57:03 1711907823

I personally use an AI FAQ bot to automate FAQ questions in some of my Discord servers. It doesn't always work as well as a human answering but it does help, in most cases. In other words, AI Agents can be very helpful but can only be trusted/useful to a limited extent compared to humans.

nothacking_ · 2024-03-31T18:38:34 1711910314

NVIDIA and one step down the chain OpenAI are making lots of money, mostly by convincing people that LLMs solve all problems.

LLMs are perfect for this, super flashy, with a ton of hype. In reality, LLMs are really bad at most applications, they are a solution in search of a problem.

tomrod · 2024-03-31T18:03:12 1711908192

Some. We are building some new processes from the ground up and will use Agents as a first draft contributor. This is typically where we find the most slowdown. And we will consistently search for the word "delve" as a misspelling :)

manojlds · 2024-03-31T17:45:43 1711907143

Eerily quiet here.

zogrodea · 2024-03-31T17:55:40 1711907740

Not sure if this is sarcasm, but the thread was only posted 20 minutes ago and has 9 replies already. I personally am tired of AI/LLM news but it still seems popular from this thread.

Scratcher368653 · 2024-03-31T17:55:24 1711907724

It is, isn't it?

spxneo · 2024-03-31T18:29:36 1711909776

I'm also noting this but could be because its Sunday?

or was GP implying lot of people are gettin funded to build agents in a thread where the consensus is they dont work well enough for people to pay for

psalmadek · 2024-03-31T17:55:29 1711907729

I’m here to read the comments but I think it is too early to give a statement of fact.

AnimalMuppet · 2024-03-31T19:52:52 1711914772

In itself, that is an interesting statement of fact: Whatever value AI agents are going to deliver, they aren't delivering yet. The jury is still out.

babelfish · 2024-03-31T17:51:25 1711907485

Klarna used the OpenAI Assistants API to automate the work of ~7000 support agents.

nicce · 2024-03-31T17:53:17 1711907597

I believe you have one zero too much over there

https://www.klarna.com/international/press/klarna-ai-assista...

babelfish · 2024-03-31T18:55:54 1711911354

Oops, thanks!

kerkeslager · 2024-03-31T17:58:53 1711907933

...which is not necessarily evidence that Klarna has obtained value from OpenAI Assistants API. It's quite possible (likely, even) that this has effectively been a complete removal of support.

I haven't used Klarna, but a few other products I've used are unsupported now because human support were replaced with AI "support" that is completely useless.

moomoo11 · 2024-03-31T18:45:58 1711910758

Cash out and wait 20-30 years to cash out again. That’s the AI train stop lol.

nonrandomstring · 2024-03-31T18:31:34 1711909894

One of the hardest lessons I learned in tech, maybe in life, was that if people don't want it you're done.

It doesn't matter that you think it's the coolest and most amazing technology in history. It may be. So what?

It doesn't matter that experts from every part of industry are yelling that "this is the future", that the march of this tech is "inevitable". They need to believe that, for their own reasons.

It doesn't matter that academics from Yale, Harvard and MIT are publishing a dozen new papers on it every week. For the mostpart their horizon ends at the campus gate.

It doesn't matter that investors are clamouring to give you money and inviting you to soirees to woo you because your project has the latest buzzwords in the name. Investors have to invest in something.

And it doesn't matter if market research people are telling you that the latent demand and growth opportunity is huge. People tell them what they want to hear.

The real test - and I wish I had known this when I was twenty - is do ordinary people on the London Omnibus want it? Not my inner ego projection. Not my wishful thinking. Not what "the numbers" say. Go and ask them.

My experience right now - from asking people (for a show I make) is that people are shit scared of AI and if they don't hold a visceral distaste for it they've an ambivalence that's about as stable nitro-glycerine on a hot day. I know that may be a difficult thing to hear as a business person.

If you are harbouring in your heart any remnant of the idea that you can create demand, that they will "see the light" and once they have a taste will be back for more, or that by will and power they can be made, regulated and peer pressured into accepting your "vision", then you'd be wise to gently let go of those thoughts.

salomon812 · 2024-03-31T17:53:47 1711907627

I've been a bit disappointed by the AI. I'll admit going in with low expectations (I know about the whole AI summer/winter cycle) and I was blown away that ChatGPT could play Jeopardy! with just a prompt since I remember being blown away by Watson and AlphaGo. But then I had it help me write a letter, and by the time I got it to do anything useful, I basically had to write an outline for it, and then I realized I had already done the hard part. I asked it to write some boilerplate code for an interface to the Slack API in Python, but it used a deprecated API, and it assumed I had a valid token. Turns out Slack has lots of different kinds of tokens and I was using the wrong one, and the AI couldn't help me figure that out. After that, I remembered the story about pain point for radiologists. They don't need help diagnosing cancer, they need help with their internet connectivity.

chancemehmu · 2024-03-31T18:12:09 1711908729

I’ve found impel (https://tryimpel.com) to work extraordinarily well till now

toomanyrichies · 2024-03-31T18:41:15 1711910475

When I was technical blogging on how to learn from open-source code [1], I used it quite frequently to get unstuck and/or to figure out how to tease apart a large question into multiple smaller functions. For example, I had no idea how to break up this long `sed` command [2] into its constituent parts, so I plugged it into ChatGPT and asked it to break down the code for me. I then Googled the different parts to confirm that ChatGPT wasn't leading me astray.

If I had asked StackOverflow the same question, it would have been quickly closed as being not broadly applicable enough (since this `sed` command is quite specific to its use case). After ChatGPT broke the code apart for me, I was able to ask StackOverflow a series of more discrete, more broadly-applicable questions and get a human answer.

TL;DR- I quite like ChatGPT as a search engine when "you don't know what you don't know", and getting unblocked means being pointed in the right direction.

1. https://www.richie.codes/shell

2. https://github.com/rbenv/rbenv/blob/e8b7a27ee67a5751b899215b...

precompute · 2024-03-31T19:18:40 1711912720

This has been my experience as well. I use Phind as a search engine, it's pretty bad for anything else. It does excel at obvious JS questions, but you can get those anywhere. It's great at sussing out a function that you only half-remember.

iAkashPaul · 2024-03-31T18:08:12 1711908492

I'm using ReACT for RAG based chatbot & tools which are just different retrievers.

swagatkonchada · 2024-03-31T18:53:48 1711911228

the use case i "feel" these are useful are for studying any given topic. having one single page helps avoid many google searches, tangential questions, etc., but i'm always looking out for inaccuracies.

causalmodels · 2024-03-31T19:29:23 1711913363

Full disclaimer up top: I have been working on agents for about a year now building what would eventually become HDR [1][2].

The first issue is that agents have extremely high failure rates. Agents really don't have the capacity to learn from either success or failure since their internal state is fixed after training. If you ask an agent to repeatedly do some task it has a chance of failing every single time. We have been able to largely mitigate this by modeling agentic software as a state machine. At every step we have the model choose the inputs to the state machine and then we record them. We then 'compile' the resulting state-transition table down into a program that we can executed deterministically. This isn't totally fool proof since the world state can change between program runs, so we have methods that allow the LLM to make slight modifications to the program as needed. The idea here is that agents should never have to solve the same problem twice. The cool thing about this approach is that smarter models make the entire system work better. If you have a particularly complex task, you can call out to gp4-turbo or claude3-opus to map out the correct action sequence and then fall back to less complex models like mistral 7b.

The second issue is that almost all software is designed for people, not LLMs. What is intuitive for human users may not be intuitive for non-human users. We're focused on making agents reliably interact with the internet so I'll use web pages as an example. Web pages contain tons of visually encoded information in things like the layout hierarchy, images, etc. But most LLMs rely on purely text inputs. You can try exposing the underling HTML or the DOM to the model, but this doesn't work so well in practice. We get around this by treating LLMs as if they were visually impaired users. We give them a purely text interface by using ARIA trees. This interface is much more compact than either the DOM or HTML so responses come back faster and cost way less.

The third issue I see with people building agents is they go after the wrong class of problem. I meet a lot of people who want to use agents for big ticket items such as planning an entire trip + doing all the booking. The cost of a trip can run into the thousands of dollars and be a nightmare to undo if something goes wrong. You really don't want to throw agents at this kind of problem, at least not yet, because the downside to failure is so high. Users generally want expensive things to be done well and agents can't do that yet.

However there are a ton of things I would like someone to do for me that would cost less than five dollars of someones time and the stakes for things going wrong are low. My go to example is making reservations. I really don't want to spend the time sorting through the hundreds of nearby restaurants. I just want to give something the general parameters of what I'm looking for and have reservations show up in my inbox. These are the kinds of tasks that agents are going to accelerate.

[1] https://github.com/hdresearch/hdr-browser [2] https://hdr.is

kwinkunks · 2024-03-31T17:56:37 1711907797

I've been disappointed by my few experiments with Langchain's agent tooling. Things I have experienced:

- The pythonrepl or llm-math agent not being used when it should be and the agent returning a wrong or approximate answer.

- The wikipedia and webbrowsed agents doing spurious research in an attempt to answer a question I did not ask (hallucinating a question, essentially).

- Agents getting stuck in a loop of asking the same question over and over until they time out.

- The model not believing an answer it gets from an agent (eg using a Python function to get today's date and not believing the answer because "The date is in the future").

When you layer all this on top of the usual challenges of writing prompts (plus, with Python function, writing the docstring so the agent knows when to call it), wrong answers, hallucination, etc, etc, I'm unconvinced. But maybe I'm doing it wrong!

jhawleypeters · 2024-03-31T18:17:53 1711909073

I am!

In my experience, you need to keep a human in the loop. This implies that you can't get the technology to scale, but I'm optimistic because LLMs have rapidly gotten better at following directions while I've been using them over the last six months.

Summarization is probably the clearest strength of LLMs over a human. With ever-growing context windows, summarizing books in one shot becomes feasible. Most books can be summarized in one sentence, though the most useful, information-dense ones cannot.

I had Gemini 1.5 Pro summarize an old book titled Natural Hormonal Enhancement yesterday. Having just read the book, the result was acceptable.

https://hawleypeters.com/summary-of-natural-hormonal-enhance...

For information-dense books, it seems clear to me that chatting with the book is the way to go. I think there's promise to build a competent agent for this kind of use case. Imagine gathering 15 papers and then chatting about their contents with an agent with queries like:

What's the consensus? Where do these papers diverge in their conclusions? Please translate this passage into plain English.

I haven't done this myself, but I have a hard time imagining such an agent being useless. Perhaps this is a failure of imagination on my part.

The brightest spot in my experimentation is [Cursor](https://cursor.sh). It's good for little dev tasks like refactoring a small block of code and chatting about how to use vim. I imagine it'd be able to talk about how to set up various configs, particularly if you @ the documentation, a feature that it supports, including [adding documentation](https://docs.cursor.sh/features/custom-docs).

Edit: I think a lot of disappointment comes from these kinds of tools not being AGI, or a replacement for a human that does some repetitive task. They magnify the power of somebody that's already curious and driven. They still empower lazy, disengaged users, but with goals like doing the bare minimum, and avoiding work altogether, these tools cannot help one accomplish much of use.

bsenftner · 2024-03-31T21:52:08 1711921928

Looks like there are not many people getting value. You, me and maybe a few others. I'm getting tremendous value, and I am surprised at all these defeat comments. Here's a short demo of an AI integrated spreadsheet https://www.youtube.com/watch?v=3t29rbs49xU

ruined · 2024-03-31T17:56:56 1711907816

copilot has been useful to me for the boring parts of writing tests, but it needs a lot of review. other than that, dogshit

Zetobal · 2024-03-31T18:08:46 1711908526

I use one that reads my RSS feeds writes a radio DJ voice over and uses an elevenlabs API call to generate the voice (their Santa voice from last year works really well). Combines it with one of my Spotify playlists and gives me a 45 minutes radio show for my commute... pretty much changed how I consume news and content like hn.

tracer4201 · 2024-03-31T17:51:16 1711907476

At my work, I have colleagues who speak English as a second language. Many of them are using LLMs to up their document and other writing.

It’s actually quite awful. It’s obvious the text is LLM generated because of the verbose, generic writing style. It communicates clearly but without substance. Not gonna lie, I secretly judge these people.

jaxomlotus · 2024-03-31T18:07:36 1711908456

We built a conversion rate optimizing AI Agent and saw about 45% click through rate lift on our own homepage. In other beta testing companies that used it, we saw a similar average and range has been +15%-175%. Agent (AB3.ai) can be tried here: https://AB3.ai

spxneo · 2024-03-31T18:31:10 1711909870

you know these driveby comments writing a bit of contextually relevant bit that always ends with a me-too link are getting tiring. To the point where I see it here, I begin to suspect things aren't going well in your other sales/marketing funnels.

jaxomlotus · 2024-03-31T20:30:42 1711917042

"Ask HN: Is anybody getting value from AI Agents? How so?"

If my comment is not on topic to answer OP's thread title question, then I'm not sure what is. The three clicks it will get buried in a HN thread are not going to do anything for us. But I also see little value in stating "Yes, I've gotten 45% of a lift using an AI agent" and not providing context.