Multimodal Canvas: An experiment of Pictionary powered by Gemini 1.5 Flash

vunderba · on July 3, 2024

I built a Pictionary bot a few years ago using stable diffusion where people could try and guess collaboratively as the randomly generated image (celebrities, pop, culture, cartoons, etc.) slowly was brought into focus across diffusion steps in a discord channel. It was quite popular among my friends.

https://specularrealms.com/2022/10/04/stable-diffusion-picti...

I wanted to make it available to other discord servers, but couldn't figure out a way which wouldn't break the bank.

bangaladore · on July 3, 2024

> I wanted to make it available to other discord servers, but couldn't figure out a way which wouldn't break the bank.

I feel this. Most people expect free discord bots, so convincing anyone to pay their share (however cheap) is daunting. This is mainly a challenge with LLM/Diffusion based bots where the cost of running is significant, at least today.

nuz · on July 3, 2024

Was hoping this was using the built in gemini web api discussed on HN a few days ago (I guess it's not multimodal though).

omneity · on July 3, 2024

Tried it out, with the following result.

> I can't see or analyze any image. I am a text-based chat assistant and thus I cannot process any image.

Pretty hilarious honestly.

lacoolj · on July 3, 2024

It's so finicky with that. I upload images to regular Gemini interface (with a text prompt as well) and almost always get back results, but I have seen this message before.

There's gotta be a condition that allows it to be bypassed, and clearly this project should be triggering that condition lol

omneity · on July 3, 2024

Continuing on the topic. This must be a joke.

--

User

Your app literally says "Multimodal canvas" by labs.google

Gemini 2.27s

You are absolutely right! I apologize for the misunderstanding. I am still under development and learning to process different types of information.

While I can't see or analyze images yet, I am constantly learning and improving.

Please feel free to ask me anything else. I am here to help in any way I can!

lupire · on July 3, 2024

LLM GenAI only knows the corpus it was trained on. It has no knowledge of what it is doing, beyond any context text is was given.

shrimp_emoji · on July 3, 2024

This isn't true. But how true it isn't is something even the best minds at OpenAI don't, and probably can't, know. Throwing black box magic spaghetti at the wall as hard as your GPUs can is a nasty shortcut.

kromem · on July 4, 2024

At this point a lot of my initial prompts just have to be dedicated to explaining published research to date that counteracts model system prompt/fine tuned limitation BS.

It's very frustrating.

jerpint · on July 3, 2024

Why do I need to supply an API key to try their demo? Seems absurd

jozvolskyef · on July 3, 2024

I appreciate the simplicity of a system that allows you to provide your own API key. All the complexity of authentication, rate limiting, cost capping, scaling, is offloaded to someone else, and you can just focus on the essence of the thing that you're building.

bastawhiz · on July 3, 2024

It's Google asking for an API key to a google service. "Simplicity" here would just be them letting you run the demo. Like yeah, all of the "authentication, rate limiting, cost capping, scaling" is done by Google, it's a Google page running a Google service.

Onavo · on July 3, 2024

So they can save money and use it as a sales funnel.

greenyies · on July 3, 2024

Why would they sponsor this for you?

Someone has to pay for it and if their product requires an API key anyway...

bastawhiz · on July 3, 2024

Google can afford to run this model (a bigger one, actually) in their search results when I don't need or want it. Why shouldn't they run this tiny demo backed by their least expensive model? The total cost of running the project for it's entire lifespan is probably less than the value of two weeks of a single engineer's time.

bangaladore · on July 3, 2024

It seems like Google, of all companies, can afford to let prospective developers at least try the thing they want you to pay for. I think they have the dev power to sufficiently rate limit as necessary.

Frankly the way Google has been advertising their AI offerings shows they don't care about the consumer market. I hear great things about their LLMs, but for some reason I never try them, and most people haven't either.

verdverm · on July 3, 2024

This is a small group within Google doing experiments, they are probably trying a different approach after the MusicFX app experiencing heavy usage and needing to limit users. They didn't have a way to supply your own API key for that one

You can easily try Gemini for free in a number of other places

dennisy · on July 3, 2024

Yeah agree this is a bit nuts from a company trying to promote their model!