Multi-modal audio is great. I talk to ChatGPT when I'm cooking or walking the do...

qingcharles · 2025-02-18T09:17:41 1739870261

Multi-modal was the absolute game-changer.

Just last night I was digging around in my basement, pulling apart my furnace, showing pics of the inside of it, having GPT explain how it works and what I needed to do to fix it.

camdenreslink · 2025-02-18T13:58:11 1739887091

I would never trust an LLM to do this unless it was pointing me to pages/sections in a real manual or reputable source I could reference.

simonw · 2025-02-18T17:30:43 1739899843

I admire your optimism that good manuals and reputable sources exist for the average furnace in the average basement.

mahogany · 2025-02-18T21:44:11 1739915051

If there are no reputable sources to point to, then where exactly is GPT deriving its answer from? And how can we be assured GPT is correct about the furnace in question?

qingcharles · 2025-02-18T22:07:15 1739916435

I mean.. I fed it all the photos of the unit and every diagram and instruction panels from the thing. I was confident in the information it was giving me about what parts did what and where to look and what to look for. You have to evaluate its output, certainly.

Getting it to fix a mower now. It's surfacing some good YouTube vids.

simonw · 2025-02-18T22:20:06 1739917206

I use it like that all the time. There's so much information in the world which assumes you have a certain level of understanding already - you can decipher the jargon terms it uses, you can fill in the blanks when it doesn't provide enough detail.

I don't have 100% of the "common sense" knowledge about every field, but good LLMs probably have ~80% of that "common sense" baked in. Which makes them better at interpreting incomplete information than I am.

A couple of examples: a post on some investment forum mentions DCA. A cooking recipe tells me "boil the pasta until done".

I absolutely buy that feeding in a few photos of dusty half-complete manual pages found near my water heater would provide enough context for it to answer questions usefully.

camdenreslink · 2025-02-18T18:03:51 1739901831

I would accept a link to a YouTube video with a timestamp. Just something connected to the real world.

idonotknowwhy · 2025-02-21T04:58:34 1740113914

Oh right, yeah I've done things like this (phone calls to ChatGPT) or the openwebui Whisper -> LLM -> TTS setup. I thought there might be something more than this by now