https://aistudio.google.com/live is by far the coolest thing here. You can just ...

internet_points · on Dec 12, 2024

I got so hopefuly by your comment I showed it my current bug that I'm working on, I even prepared everything first with my github issue, the relevant code, the terminal with the failing tests, and I pasted it the full contents of the file and explained carefully what I wanted to achieve. As I was doing this, it repeated back to me everything I said, saying things like "if I understand correctly you're showing me a file called foo dot see see pee" and "I see you have a github issue open called extraneous spaces in frobnicator issue number sixty six" and "I see you have shared some extensive code" and after some more of this "validation"-speak it started repeating the full contents of the file like "backquote backquote backquote forward slash star .. import ess tee dee colon colon .." and so on.

Not quite up to excited junior-level programmer standards yet. But maybe good for other things who knows.

nolist_policy · on Dec 12, 2024

You just rediscovered that LLMs become much stupider when using images as input. I think that has been already shown for gpt-4 as well.

dtquad · on Dec 12, 2024

Even when using the web search tool GPT-4 becomes stupider.

Do they use a dumber model for tool/vision?

dwaltrip · on Dec 12, 2024

I’m guessing that it’s just a much harder problem. Images often contain more information but it is far less structured and refined than language.

The transformation process that occurs when people speak and write is incredibly rich and complex. Compared to images which are essentially just the outputs of cameras or screen captures — there isn’t an “intelligent” transformation process occurring.

justsid · on Dec 13, 2024

I think images also have much higher information density than words, or at least they can. There is a reason a picture is worth 1000 words.

tbruckner · on Dec 12, 2024

This is news to me. Any good examples of this outside of the above?

nolist_policy · on Dec 12, 2024

Vision language models are blind (192 comments) https://news.ycombinator.com/item?id=40926734

fnordpiglet · on Dec 12, 2024

On the other hand if I take pictures of circuits, boards, electronic components, etc GPT4o is pretty reliably able to explain the me the pinouts, the board layouts, reference material in the data sheets, and provide pretty reasonable advice about how to use it (I.e., where to put resistors and why, what pins to use for the component on an esp32, etc). As a beginner in electronics this is fabulously helpful. Its ability to pass vision tests seems like a pretty dumb utility metric when most people judge utility by how useful things are.

zamadatix · on Dec 12, 2024

> foo dot see see pee

Well there's your problem!

lovasoa · on Dec 12, 2024

ccp is the Chinese version of c++. Or maybe they meant ссср, the Soviet version.

nuancebydefault · on Dec 12, 2024

The ccp extension is just another c++ flavour /i

jug · on Dec 12, 2024

Not sure this is an AI limitation. I think you'd be better off here with the Gemini Code Assist plugin in VS Code rather than that. Sounds like the AI is provided with unstructured information compared to an actual code base.

sky2224 · on Dec 12, 2024

THIS is the thing I'm excited for with AI.

I'm someone that becomes about 5x more productive when I have a person watching or just checking in on me (even if they're just hovering there).

Having AI to basically be that "parent" to kick me into gear would be so helpful. 90% of the time my problems are because I need someone to help keep the gears turning for me, but there isn't always someone available. This has the potential to be a person that's always available.

Huppie · on Dec 12, 2024

Just as an FYI: I recently learned (here on HN) that this is called Body Doubling[0] there's some services around (there's at least one by someone that hangs around here) that can do this too.

[0] https://en.m.wikipedia.org/wiki/Body_doubling

nolist_policy · on Dec 12, 2024

Also, there are co-working spaces in VRChat, which works wonders for me.

I went to the Glass Office co-working space to study for exams this summer and it worked out really well. I also met some nice people there.

A standalone Quest 3 is enough to get you started.

themaninthedark · on Dec 12, 2024

Do they support WFH setups?

mklepaczewski · on Dec 12, 2024

The parent might be referring to us: https://workmode.net/. Most of our clients work from home. Do you have a specific concern about body doubling and working from home?

biztos · on Dec 13, 2024

That’s interesting, I never considered something like that.

But at that low price, surely you have a bunch of customers being watched by each employee, and then talking to only one at a time — isn’t it distracting to see your “double” chatting away with the sound off?

mklepaczewski · on Dec 14, 2024

No, nobody has ever complained about it (and yes, we did ask). When we first started, we were really concerned about it, so we tried to move as little as possible, avoid hand gestures, and so on. However, it turned out to be a non-issue.

Fun fact: I’d estimate that 50% of users don’t even look at their Productivity Partner while they work. WorkMode runs in another tab, and users rarely switch back to it. They don’t need to see us - they just need to know we’re watching. I’m in that group.

kirubakaran · on Dec 12, 2024

Some unsolicited feedback (please feel free to ignore):

When I click on "Pricing" in the nav bar, it scrolls down, and the first thing that catches my eye is "$2100 / month". I happened to see this time that this is the benefit you're projecting and it is actually $2.50/hour. On the previous times I've visited your website based on your HN comments, I've always thought $2100/month was what you were going to charge me and closed the tab.

I've been frustrated myself that people don't read what's right there on the page when they come to my startup app's landing page. Turns out I do the same. Hope this helps you improve the layout / font sizes and such "information hierarchy" so the correct information is conveyed at a glance.

IMHO $2.50/hour is great value, and stands on its own. I know how much my time is worth, so perhaps the page doesn't really have to shout that to convince me.

Again, please feel free to ignore this as it is quite possible that it is just me with the attention span of a goldfish with CTE while clicking around new websites.

mklepaczewski · on Dec 13, 2024

Thank you! I hadn’t thought of it that way, but what you wrote makes total sense and explains the engagement issues we’re seeing with the calculator and the pricing section.

> Again, please feel free to ignore this as it is quite possible that it is just me with the attention span of a goldfish with CTE while clicking around new websites.

Most of our clients have issues with attention span, so your feedback is gold :-) Again, thank you!

kirubakaran · on Dec 13, 2024

Welcome! btw this is how it looked: https://i.imgur.com/qg8gNJF.png

I understand if the window were taller, I'd have seen the actual price cards. I think it's just that when you click "Pricing", you expect the next obvious number you see to be the price.

1123581321 · on Dec 13, 2024

Clever service! I assume your employees watch several people at once. Is it engaging enough work for them?

mklepaczewski · on Dec 13, 2024

Yes, they monitor several people simultaneously. Most clients ask us to check in on their progress every 15–30 minutes, and these interactions can last anywhere from a few seconds to three minutes, depending on the client and the challenges they're facing. It might be boring when working with a single person, but it gets more challenging as more people connect.

Also, we do more than just body doubling. Some clients need to follow a morning ritual before starting their work (think meditation, a quick house cleanup, etc.). Sometimes, we perform sanity checks on their to-do lists (people often create tasks that are too vague or vastly underestimate the time needed to complete them). We ask them to apply the 2-minute rule, and so on. It all depends on the client's needs.

mklepaczewski · on Dec 12, 2024

Interesting! I see how this could work for inattentive procrastinators. By "inattentive procrastinators", I mean people who are easily distracted and forget that they need to work on their tasks. Once reminded, they return to their tasks without much fuss.

However, I doubt it would work for hedonistic procrastinators. When body doubling, hedonistic procrastinators rely on social pressure to be productive. Using AI likely won't work unless the person perceives the AI as a human.

losvedir · on Dec 12, 2024

You don't necessarily need to believe the AI is a human for it to tickle the ingrained social instincts you're looking for. For example, I'm quite aware that AI's are just tools, and yet I still feel a strong need to be "polite" in my requests to ChatGPT. "Please do ...." or "Can you...?" and even "Thanks, that worked! Now can you..." etc.

mklepaczewski · on Dec 13, 2024

I do the same, but I think it's because we were taught to be polite and to conduct conversations in a certain way.

Do you put effort into being polite when ChatGPT makes a mistake and you correct it? Do you try to soften the blow to avoid hurting its "feelings"? Do you feel bad if you respond impolitely? I don't.

nuancebydefault · on Dec 12, 2024

You only do that politeness as a novice.

My questions to copilot.ms.com today are more like the following, still works like a charm...

"I have cpp code: <enter><code snippet><enter> and i get error <piece of compilation output>. Is this wrong smart ponitor?"

[elaborate answer with nice examples]

"Works. <Next question>"

ishtanbul · on Dec 12, 2024

I dont feel this at all. I treat chatgpt like an investment banking intern.

player1234 · on Dec 13, 2024

So why not fire 3 of your colleges and have another whos new job is watching over/checking in on you and by your own account productivity would be about the same. Save your company some money it will be appreciated!

On an unrelated note, I believe people need to start quantifying their outrageous ai productivity claims or shut up.

Jeff_Brown · on Dec 12, 2024

I'm intrigued to know whether that actually ends up working. I am something like that myself, but I don't know whether it is an effect of getting feedback or of having a person behind the feedback.

sky2224 · on Dec 12, 2024

There's definitely an ideal setup that's needed in order for it to work. I'm also not quite sure what part of the other person being present causes me to focus better (i.e., whether it's the presence vs good ideas and feedback).

I'm leaning toward saying that the main issue for me is that I need to keep my focus on things that are active engagement rather than more passive engagement like taking notes versus just reading a passage.

mycall · on Dec 12, 2024

Your "parent" kicked you into gear because you have an emotional bond with them. A stranger might cause your guards to go up if you do not respect them as with wisdom. So too may go an AI.

sky2224 · on Dec 12, 2024

I used the term "parent" here because it was the descriptor I thought people would understand best.

For me personally, I was awful at working when my parents were hovering over me.

In the past, I used to work with a professor on a project and we'd spend significant amounts of time on zoom calls working (this was during COVID). The professor wouldn't even be helping me the entire time, but as soon as I was blocked, I'd start talking and the ideas would bounce back and forth and I'd find a solution significantly quicker.

BornInTheUSSR · on Dec 12, 2024

Shameless plug, I'm working on something like this https://myaipal.kit.com/prerelease

sky2224 · on Dec 14, 2024

So I watched the demo video on your site, and honestly I'm not sure how this is really all that much better than what can already be done with ChatGPT.

The key is, I don't want to have to initiate the contact. Hand holding the AI myself defeats the purpose. The ideal AI assistant is one that behaves as if it's a person that's sitting next to me.

Imagine you're a junior that gets on a teams call to get help via pair programming with your boss. For anything more than just a quick fix, pair programming on calls tends to turn into the junior working on something, hitting a roadblock, and the boss stepping in to provide input.

Here's the really important part that I've realized: very rarely will the input that the boss provides be something that is leaps and bounds outside of the ability of the junior. A lot of it will just be asking questions or talking the problem through until it turns the gears enough for the junior to continue on their own. THAT right there. That's the gear turning AI agent I'm looking for.

If someone could develop a tool that "knows" the right time to jump in and talk with you, then I think we'd see huge jumps in productivity for people.

slowmovintarget · on Dec 12, 2024

At least you can theoretically stop sharing with this one. Microsoft was essentially trying to do this, but doing it for everything on your PC, with zero transparency.

Here's Google doing essentially the same thing, even more so that it's explicitly shipping your activity to the cloud, and this response is so different from the "we're sticking this on your machine and you can't turn it off" version Microsoft was attempting to land. This is what Microsoft should have done.

chefandy · on Dec 12, 2024

This is great! I viscerally dislike the "we're going to do art for you so you don't have to... even if you want to..." side of AI, but learning to use the tools to get the satisfaction of making it yourself is not easy! After 2 decades of working with 2D art and code separately, learning 3D stuff (if you include things like the complex and counterintuitive data flow of simulations in Houdini and the like) was as or more difficult than learning to code. Beyond that, taking classes is f'ing expensive, and more of that money goes to educational institutions than the teachers themselves. Obviously, getting beyond the basics for things that require experienced critique are just going to need human understanding, but for the base technical stuff, this is fantastic.

cryptozeus · on Dec 12, 2024

This comment is better than entire ad google just showed. Who is still pointing at the building with camera and asking what is this building?

kridsdale1 · on Dec 12, 2024

I do that in Manhattan. I also do it for yonder mountains.

Brotkrumen · on Dec 12, 2024

Sounds interesting, but voice input isn't working for me there. I guess I'm too niche with my Mac and Firefox setup.

mentalgear · on Dec 12, 2024

Actually plenty of tech people are using mac & firefox

portaouflop · on Dec 12, 2024

Irony detectors malfunctioning perhaps?

baq · on Dec 12, 2024

'irony' meant 'something made of metal' last time I checked

socksy · on Dec 12, 2024

Right, and Macs are made out of aluminium

Alifatisk · on Dec 12, 2024

What is Firefox made out of then?

internet_points · on Dec 12, 2024

The amount of rust indicates iron. So Firefox is very irony.

dudeinjapan · on Dec 12, 2024

Fire and foxes, presumably.

cyanwave · on Dec 12, 2024

Wood.

shakna · on Dec 12, 2024

Aluminum. It's American.

littlestymaar · on Dec 12, 2024

This isnt entierly suprising as Google have been breaking things artificially on Firefox for years now (Google Map and YouTube at least). Maybe try spoofing Chrome's user-agent.

SkyPuncher · on Dec 12, 2024

Console is throwing an error: "Connecting AudioNodes from AudioContexts with different sample-rate is currently not supported."

Quick research suggests this is part of Firefox's anti-fingerprinting functionality.

icelancer · on Dec 12, 2024

I tried this, shared a terminal, asked it to talk about it, and it guessed that it was Google Chrome with some webUI stuff. Immediately closed the window and bailed.

kridsdale1 · on Dec 12, 2024

Which terminal? Was it chromium based?

icelancer · on Dec 12, 2024

Nope. Just KiTTY on Windows.

selvan · on Dec 12, 2024

Get started documentation on Multimodal Live API : https://ai.google.dev/api/multimodal-live

Zababa · on Dec 12, 2024

I don't know what's not working but I get "Has a large language model. I don't have the capability to see your screen or any other visual input. My interactions are purely based on the text that you provide"

moffkalast · on Dec 12, 2024

This'll be so fantastic once local models can do it, because nobody in the right mind would stream their voice and everything they do on their machine to Google right? Right?

Oh who am I kidding, people upload literally everything to drive lmao.