I've tested bard/gemini extensively on tasks that I routinely get very helpful r...

gundmc · on Feb 1, 2024

I don't think Google has released the version of Gemini that is supposed to compete with GPT4 yet. The current version is apparently more on the level of GPT 3.5, so your observations don't surprise me

CSMastermind · on Feb 1, 2024

I will say as someone who tries to regularly evaluate all the models Google's censorship is much worse than other companies. I routinely get "I can't do that" messages from Bard and no one else when testing queries.

As an example, I had a photo of a beach I wanted to see if it knew the location of and it was blocked for inappropriate content. I stared at the picture for like 5 minutes confused until I blacked out the woman in a bikini standing on the beach and resubmitted the query at which point it processed it.

It's refused to do translation for me because the text contains 'rude language'. It's blocked my requests on copyright grounds.

I don't at all understand the heavy-handed censorship they're applying when they're behind in the market.

nuclearnice3 · on Feb 1, 2024

It won't surprise me if the photo or similar ends up banning your entire Google life with no reasonable appeal possible.

jonplackett · on Feb 1, 2024

I just tried to get it to write me some code that queries an API and it refused.

I asked if it was not allowed to write any code for any APIs and it said yes that’s true. FFS

outside415 · on Feb 1, 2024

their censorship is the worst of any platform. being killed from within by the woke mob apparently. it's a pity for google employees, they're going to be undergoing cost cutting/perpetual lay offs for the foreseeable future as other players eat their advertising lunch.

mike10921 · on Feb 1, 2024

On the flip side, I find that GPT4 is constantly getting degraded. It intentionally only returns partial answers even when I direct it specifically not to do so. My guess is, that they are trying to save on CPU consumption by generating shorter responses.

resters · on Feb 1, 2024

I think at high traffic times it gets slightly different parameters that make it more likely to do that. I've had the best results during what I think are off-peak hours.

sebzim4500 · on Feb 2, 2024

My feeling is that it got worse but then it got better again over the last few months. I don't have data to back this of course.

nomel · on Feb 1, 2024

Is this with the API or web interface?

sjwhevvvvvsj · on Feb 1, 2024

My personal favorite Bard failure mode is when I need help with Google Cloud and Bard has no idea what to do but GPT tells me *exactly* what I need.

If you can’t even support your own products…I’m not sure what I’m supposed to do with this pos.

behnamoh · on Feb 1, 2024

> I've tested bard/gemini extensively on tasks that I routinely get very helpful results from GPT-4 with, and bard consistently, even dramatically underperforms.

Yes. And I don't buy the lmsys leaderboard results where Google somehow shoved a mysterious gemini-pro model to be better than GPT-4. In my experience, its answers looked very much like GPT-4 (even the choice of words) so it could be that Bard was finetuned on GPT-4 data.

Shady business when Google's Bard service is miles behind GPT-4.

resters · on Feb 1, 2024

True, what is most puzzling about it is the effort Google is putting into generating hype for something that is at best months away (by which time OpenAI will likely have released a better model)...

My best guess is that Google realizes that something like GPT-4 is a far superior interface to interact with the world's information than search, and since most of Google's revenue comes from search, the handwriting is on the wall that Google's profitability will be completely destroyed in a few years once the world catches on.

MS seeems to have had that same paranoia with the bingified GPT-4. What I found most remarkable about it was how much worse it performed seemingly because it was incorporating the top n bing results into the interaction.

Obviously there are a lot of refinements to how a RAG or similar workflow might actually generate helpful queries and inform the AI behind the scenes with relevant high quality context.

I think GPT-4 probably does this to some extent today. So what is remarkable is how far behind Google (and even MS via it's bingified version) are from what OpenAI has already available for $20 per month.

Google started out free of spammy ads and has increasingly become more and more like the kind of ads everywhere in your face, spammy stuff that it replaced.

GPT-4 is such a refreshingly simple and to the point way to interact with information. This is antithetical to what funds Google's current massive business... namely ads that distract from what the user wanted in hopes of inspiring a transaction that can be linked to the ad via a massive surveillance network and behavioral profiling model.

I would not be surprised if within Google the product vision for the ultimate AI assistant is one that gently mentions various products and services as part of every interaction.

fatherzine · on Feb 1, 2024

the search business has always been caught between delivering simple and to the point results to users and skewing results to generate return on investment to advertisers.

in its early years google was also refreshingly simple and to the point. the billion then trillion dollars market capitalization placed pressure on them to deliver financial results, the ads spam grew like a cancer. openai is destined for the same trajectory, if only faster. it will be poetic to watch all the 'ethical' censorship machinery repurposed to subtly weigh conversations in favor of this or other brand. pragmatically, the trillion dollar question is what will be the openai take on adwords.

resters · on Feb 1, 2024

> what will be the openai take on adwords

Ads are supposed to reduce transaction cost by spreading information to allow consumers to efficiently make decisions about purchases, many of which entail complex trade-offs.

In other words, people already want to buy things.

I would love to be able to ask an intelligence with access to the world's information questions to help me efficiently make purchasing decisions. I've tried this a few times with GPT-4 and it seems to bias heavily toward whatever came up in the first few pages of web results, and rarely "knows" anything useful about the products.

A sufficiently good product or service will market itself and it is rarely necessary for marketing spend or brand marketing for those rare exceptional products and services.

For the rest of the space of products and services, ad spend is a signal that the product is not good enough that the customer would have already heard about it.

With an AI assistant, getting a sense of the space of available products and services should be simple and concise, without the noise and imprecision of ads and clutter of "near miss" products and services ("reach" that companies paid for) cluttering things up.

The bigger question is which AI assistant people will trust they can ask important questions to and get unbiased and helpful results. "Which brand of Moka pot under $20 is the highest quality?" or "Help me decide which car to buy" are the kinds of questions that require a solid analytical framework and access to quality data to answer correctly.

AI assistants will act like the invisible hand and shoudl not have a thumb on the scale. I would pay more than $20 per month to use such an AI. I find it hard to believe that OpenAI would have to resort to any model other than a paid subscription if the information and analysis is truly high quality (which it appears to be so far).

thibauts · on Feb 3, 2024

I did exactly that with a custom GPT and it works pretty well. I did my best to push it to respond with its training knowledge about brand reputation and avoid searches. When it has to resort to searches I pushed it to use trusted product information sources and avoid spammy or ad-ridden sites.

It allowed me to spot the best brands and sometimes even products in verticals I knew nothing about beforehand. It’s not perfect but already very efficient.

xiphias2 · on Feb 1, 2024

The ad model already went to take attribution / conversion from different sources into account (although there's a lot of spammy implementations), but it took many years for Google to make youtube / mobile ads profitable, and now adoption is much faster.

EvgeniyZh · on Feb 1, 2024

> And I don't buy the lmsys leaderboard results where Google somehow shoved a mysterious gemini-pro model to be better than GPT-4.

What do you mean by "don't buy"? You think lmsys is lying and the leaderboard do not reflect the results? Or that google is lying to lmsys and have a better model to serve exclusively to lmsys but not to others? Or something else?

behnamoh · on Feb 1, 2024

Most likely the latter. Either Google has a better model which they disguise as Bard to make up for the bad press Bard has received, or Google doesn't really have a better model—just a Gemini Pro fine tuned on GPT-4 data to sound like GPT-4 and rank high in the leaderboard.

EvgeniyZh · on Feb 2, 2024

> Either Google has a better model which they disguise as Bard

Why wouldn't they use this model in bard then? Anyway this is easily verifiable claim, are there any prompts that consistently work at lmsys but not at bard interface?

> fine tuned on GPT-4 data to sound like GPT-4 and rank high

This I don't get. Why would many different random people rank bad model that sounds like gpt4 higher than good model that doesn't? What is even the meaning of "better model" in such settings if not user preference?

huytersd · on Feb 1, 2024

I guess Pro is not supposed to be on par with GPT4. That would be Ultra coming out sometime in the first quarter. I’m going to reserve judgement till that is released.

nycdatasci · on Feb 1, 2024

Per LLM leaderboard, Bard (jan 24 - Gemini Pro) is on par with GPT 4: https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboar...

I think there’s bias in the types of prompts they’re getting. In my personal experience, Bard is useful for creative use cases but not good with reasoning or facts.

skissane · on Feb 2, 2024

Here is a simple maths problem that GPT-4 gets right but Bard (even the Gemini Pro version) consistently gets wrong: “What is one (short scale) centillion divided by the cube of a googol?”

But you are right, we don’t know the types of prompts Chatbot Arena users are submitting. Maths problems like that are probably a small minority of usage.

One other thing I notice: if you ask about controversial issues, both GPT-3.5/4 and Bard can get a bit “preachy” from a progressive perspective - but I personally find Bard to be noticeably more “preachy” than OpenAI at this (while still not reaching Llama levels)

EvgeniyZh · on Feb 1, 2024

They are getting whatever you give them ;)

YetAnotherNick · on Feb 1, 2024

In my experience, Bard is not comparable to GPT-3.5 in terms of instruction following and it sometimes gets lost in complex situations and then the response quality drops significantly. While GPT-3.5 is a much better feel, if that is a word for evaluating LLMs. And Bard is just annoying if it can't complete a task.

Also hallucinations are wild in Gemini pro compared to GPT-3.5.

AuthConnectFail · on Feb 1, 2024

any examples of that? my experience has been other way round (i don't have gpt-4 access so i am comparing chatgpt-3.5 with bard)

ipsum2 · on Feb 1, 2024

I don't know why you were down voted for sharing your opinion on bard. I agree with you that bard is significantly worse than gpt 3.5.

zellyk · on Feb 1, 2024

Bard has been dead to me the second I saw it was not available in Canada... GPT all the way to be honest.

replwoacause · on Feb 2, 2024

Trust me you’re not missing anything. Google sucks at AI.

sroussey · on Feb 1, 2024

How were you able to test Gemini Pro before today? Are you able to test Gemini Ultra?

dchest · on Feb 1, 2024

From the linked article: "Last December, we brought Gemini Pro into Bard in English..."

qwertox · on Feb 1, 2024

Just a note, AFAIK it was only available in the US.

It was usable via VPN with an US IP address, and whenever I tried it without VPN Bard reported not using Gemini when asked, even when asked in English.

vitorgrs · on Feb 5, 2024

No. Only in Europe Google didn't released Gemini pro for bard.

762236 · on Feb 1, 2024

My experience is the opposite. I'm really tired of fighting ChatGPT.

shortrounddev2 · on Feb 1, 2024

By comparison I find bing image generator kicks dall-es ass

vunderba · on Feb 2, 2024

The Bing image generator is using DALL-E 3 under the covers. You are likely comparing it to the original DALL-E which obviously is a huge difference.

distances · on Feb 1, 2024

I get good results through ChatGPT image generation but mostly disappointing when using DALL-E directly. Not sure if my prompt game is just sorely lacking or if there's something else being involved via ChatGPT.

jjackson5324 · on Feb 1, 2024

> By comparison I find bing image generator kicks dall-es ass

Huh? Doesn't bing image generator just use the DALL-E api?

novagameco · on Feb 5, 2024

Apparently, but when I use Dall-e 3 on OpenAI, the images it generates look like shit. Under-developed, with crappy eyes and hands, the kind of typical mutant stuff you see with AI generated Images. Bing seems to be much better at those types of details out of the box

Filligree · on Feb 1, 2024

It's the same generator, yeah. And I find the Bing version of it has so heavy censoring, I can never make it actually draw anything I want...

mavamaarten · on Feb 1, 2024

Same. I wanted to create some silly images but literally almost everything I tried was censored.

sfmike · on Feb 1, 2024

is it free

mrWiz · on Feb 1, 2024

maxglute · on Feb 1, 2024

For whatever reason Bard doing pretty good with Google Sheet script suggestions than GPT4 for me. ALmost everything else is subpar.

mensetmanusman · on Feb 1, 2024

Google just has to pay their AI scientist eight figures to catch up.