Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Multimodal Canvas: An experiment of Pictionary powered by Gemini 1.5 Flash (labs.google)
55 points by meetpateltech on July 3, 2024 | hide | past | favorite | 18 comments


I built a Pictionary bot a few years ago using stable diffusion where people could try and guess collaboratively as the randomly generated image (celebrities, pop, culture, cartoons, etc.) slowly was brought into focus across diffusion steps in a discord channel. It was quite popular among my friends.

https://specularrealms.com/2022/10/04/stable-diffusion-picti...

I wanted to make it available to other discord servers, but couldn't figure out a way which wouldn't break the bank.


> I wanted to make it available to other discord servers, but couldn't figure out a way which wouldn't break the bank.

I feel this. Most people expect free discord bots, so convincing anyone to pay their share (however cheap) is daunting. This is mainly a challenge with LLM/Diffusion based bots where the cost of running is significant, at least today.


Was hoping this was using the built in gemini web api discussed on HN a few days ago (I guess it's not multimodal though).


Tried it out, with the following result.

> I can't see or analyze any image. I am a text-based chat assistant and thus I cannot process any image.

Pretty hilarious honestly.


It's so finicky with that. I upload images to regular Gemini interface (with a text prompt as well) and almost always get back results, but I have seen this message before.

There's gotta be a condition that allows it to be bypassed, and clearly this project should be triggering that condition lol


Continuing on the topic. This must be a joke.

--

User

Your app literally says "Multimodal canvas" by labs.google

Gemini 2.27s

You are absolutely right! I apologize for the misunderstanding. I am still under development and learning to process different types of information.

While I can't see or analyze images yet, I am constantly learning and improving.

Please feel free to ask me anything else. I am here to help in any way I can!


LLM GenAI only knows the corpus it was trained on. It has no knowledge of what it is doing, beyond any context text is was given.


This isn't true. But how true it isn't is something even the best minds at OpenAI don't, and probably can't, know. Throwing black box magic spaghetti at the wall as hard as your GPUs can is a nasty shortcut.


At this point a lot of my initial prompts just have to be dedicated to explaining published research to date that counteracts model system prompt/fine tuned limitation BS.

It's very frustrating.


Why do I need to supply an API key to try their demo? Seems absurd


I appreciate the simplicity of a system that allows you to provide your own API key. All the complexity of authentication, rate limiting, cost capping, scaling, is offloaded to someone else, and you can just focus on the essence of the thing that you're building.


It's Google asking for an API key to a google service. "Simplicity" here would just be them letting you run the demo. Like yeah, all of the "authentication, rate limiting, cost capping, scaling" is done by Google, it's a Google page running a Google service.


So they can save money and use it as a sales funnel.


Why would they sponsor this for you?

Someone has to pay for it and if their product requires an API key anyway...


Google can afford to run this model (a bigger one, actually) in their search results when I don't need or want it. Why shouldn't they run this tiny demo backed by their least expensive model? The total cost of running the project for it's entire lifespan is probably less than the value of two weeks of a single engineer's time.


It seems like Google, of all companies, can afford to let prospective developers at least try the thing they want you to pay for. I think they have the dev power to sufficiently rate limit as necessary.

Frankly the way Google has been advertising their AI offerings shows they don't care about the consumer market. I hear great things about their LLMs, but for some reason I never try them, and most people haven't either.


This is a small group within Google doing experiments, they are probably trying a different approach after the MusicFX app experiencing heavy usage and needing to limit users. They didn't have a way to supply your own API key for that one

You can easily try Gemini for free in a number of other places


Yeah agree this is a bit nuts from a company trying to promote their model!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: