It's very hard for me to imagine the current level of agents serving a useful purpose in my personal life. If I ask this to plan a date night with my wife this weekend, it needs to consult my calendar to pick the best night, pick a bar and restaurant we like (how would it know?), book a babysitter (can it learn who we use and text them on my behalf?), etc. This is a lot of stuff it has to get right, and it requires a lot of trust!
I'm excited that this capability is getting close, but I think the current level of performance mostly makes for a good demo and isn't quite something I'm ready to adopt into daily life. Also, OpenAI faces a huge uphill battle with all the integrations required to make stuff like this useful. Apple and Microsoft are in much better spots to make a truly useful agent, if they can figure out the tech.
Maybe this is the "bitter lesson of agentic decisions": hard things in your life are hard because they involve deeply personal values and complex interpersonal dynamics, not because they are difficult in an operational sense. Calling a restaurant to make a reservation is trivial. Deciding what restaurant to take your wife to for your wedding anniversary is the hard part (Does ChatGPT know that your first date was at a burger-and-shake place? Does it know your wife got food poisoning the last time she ate sushi?). Even a highly paid human concierge couldn't do it for you. The Navier–Stokes smoothness problem will be solved before "plan a birthday party for my daughter."
Well, people do have personal assistants and concierges, so it can be done? but I think they need a lot of time and personal attention from you to get that useful right. they need to remember everything you've mentioned offhand or take little corrections consistently.
It seems to me like you have to reset the context window on LLMs way more often than would be practical for that
I think it's doable with the current context window we have, the issue is the LLM needs to listen passively to a lot of things in our lives, and we have to trust the providers with such an insane amount of data.
I think Google will excel at this because their ad targeting does this already, they just need to adapt to an llm can use that data as well.
I would even argue the hard parts of being human don't even need to be automated. Why are we all in a rush to automate everything, including what makes us human?
> hard things in your life are hard because they involve deeply personal values and complex interpersonal dynamics, not because they are difficult in an operational sense
I think what's interesting here is that it's a super cheap version of what many busy people already do -- hire a person to help do this. Why? Because the interface is easier and often less disruptive to our life. Instead of hopping from website to website, I'm just responding to a targeted imessage question from my human assistant "I think you should go with this <sitter,restaurant>, that work?" The next time I need to plan a date night, my assistant already knows what I like.
Replying "yes, book it" is way easier than clicking through a ton of UIs on disparate websites.
My opinion is that agents looking to "one-shot" tasks is the wrong UX. It's the async, single simple interface that is way easier to integrate into your life that's attractive IMO.
Yes! I’ve been thinking along similar lines: agents and LLMs are exposing the worst parts of the ergonomics of our current interfaces and tools (eg programming languages, frameworks).
I reckon there’s a lot to be said for fixing or tweaking the underlying UX of things, as opposed to brute forcing things with an expensive LLM.
> It's very hard for me to imagine the current level of agents serving a useful purpose in my personal life. If I ask this to plan a date night with my wife this weekend, it needs to consult my calendar to pick the best night, pick a bar and restaurant we like (how would it know?), book a babysitter (can it learn who we use and text them on my behalf?), etc. This is a lot of stuff it has to get right, and it requires a lot of trust!
This would be my ideal "vision" for agents, for personal use, and why I'm so disappointed in Apple's AI flop because this is basically what they promised at last year's WWDC. I even tried out a Pixel 9 pro for a while with Gemini and Google was no further ahead on this level of integration either.
But like you said, trust is definitely going to be a barrier to this level of agent behavior. LLMs still get too much wrong, and are too confident in their wrong answers. They are so frequently wrong to the point where even if it could, I wouldn't want it to take all of those actions autonomously out of fear for what it might actually say when it messages people, who it might add to the calendar invites, etc.
Agents are nothing more than the core chat model with a system prompt, and wrapper that parses responses and executes actions and puts the result into the prompt, and a system instruction that lets the model know what it can do.
Nothing is really that advanced yet with agents themselves - no real reasoning going on.
That being said, you can build your own agents fairly straightforward. The key is designing the wrapper and the system instructions. For example, you can have a guided chat on where it builds of the functionality of looking at your calendar, google location history, babysitter booking, and integrate all of that into automatic actions.
I am not sure I see most of this as a problem. For an agent you would want to write some longer instructions than just "book me an aniversery dinner with my wife".
You would want to write a couple paragraphs outlining what you were hoping to get (maybe the waterfront view was the important thing? Maybe the specific place?)
As for booking a babysitter - if you don't already have a specific person in mind (I don't have kids), then that is likely a separate search. If you do, then their availability is a limiting factor, in just the same way your calendar was and no one, not you, not an agent, not a secretary, can confirm the restaurant unless/until you hear back from them.
As an inspiration for the query, here is one I used with Chat GPT earlier:
>I live in <redacted>. I need a place to get a good quality haircut close to where I live. Its important that the place has opening hours outside my 8:00 to 16:00 mon-fri job and good reviews.
>
>I am not sensitive to the price. Go online and find places near my home. Find recent reviews and list the places, their names, a summary of the reviews and their opening hours.
>
>Thank you
It has to earn that trust and that takes time. But there are a lot of personal use cases like yours that I can imagine.
For example, I suddenly need to reserve a dinner for 8 tomorrow night. That's a pain for me to do, but if I could give it some basic parameters, I'm good with an agent doing this. Let them make the maybe 10-15 calls or queries needed to find a restaurant that fits my constraints and get a reservation.
I see restaurant reservations as an example of an AI agent-appropriate task fairly often, but I feel like it's something that's neither difficult (two or three clicks on OpenTable and I see dozens of options I can book in one more click), nor especially compelling to outsource (if I'm booking something for a group, choosing the place is kind of personal and social—I'm taking everything I know about everybody in the group into account, and I'd likely spend more time downloading that nuance to the agent than I would just scrolling past a few places I know wouldn't work).
Similar to what was shown in the video when I make a large purchase like a home or car I usually obsess for a couple of years and make a huge spreadsheet to evaluate my decisions. Having an agent get all the spreadsheet data would be a big win. I had some success recently trying that with manus.
>it needs to consult my calendar to pick the best night, pick a bar and restaurant we like (how would it know?), book a babysitter (can it learn who we use and text them on my behalf?), etc
This (and not model quality) is why I’m betting on Google.
I'm excited that this capability is getting close, but I think the current level of performance mostly makes for a good demo and isn't quite something I'm ready to adopt into daily life. Also, OpenAI faces a huge uphill battle with all the integrations required to make stuff like this useful. Apple and Microsoft are in much better spots to make a truly useful agent, if they can figure out the tech.