Bard is getting better at logic and reasoning

underyx · on June 7, 2023

Trying my favorite LLM prompt to benchmark reasoning, as I mentioned in a thread four weeks ago[0].

> I'm playing assetto corsa competizione, and I need you to tell me how many liters of fuel to take in a race. The qualifying time was 2:04.317, the race is 20 minutes long, and the car uses 2.73 liters per lap.

The correct answer is around 29, which GPT-4 has always known, but Bard just gave me 163.8, 21, and 24.82 as answers across three drafts.

What's even weirder is that Bard's first draft output ten lines of (wrong) Python code to calculate the result, even though my prompt mentioned nothing coding related. I wonder how non-technical users will react to this behavior. Another interesting thing is that the code follows Google's style guides.

[0]: https://news.ycombinator.com/item?id=35893130

devjab · on June 8, 2023

GPT seems to get improvements of trap questions when they reach social popularity. Even the free version of ChatGPT now knows that a kilogram of feathers weighs the same as a kilogram of lead, and it didn’t always know that.

I’m not sure these types of prompt tricks are a good way of measuring logic unless Google is also implementing these directly into Bard when the hilarious outputs reach enough traction on social media.

I do wonder how OpenAI fix these logical blunders.

My biggest issue with both isn’t that they fall into these traps though. It’s that I can get them to tell me long stories about what happens in Horus Heresy books that never actually happened. Whether the info comes from questionable sources or they are just making things up is sort of irrelevant to me, what “scares” me about those conversations is how true the answers sound, and if they are “lying” about the Horus Heresy then what else will they lie about? Don’t get me wrong, GPT now writes virtually all my JSDoc documentation and it continues to impress me when doing so, but I’m very reluctant to use it for actual information. Not only because of my time wasting conversations about the Horus Heresy but also because we’ve had it “invent” C# functions that had never existed in any version of .Net or C# when tasked to solve problems. I just mention the HH as an example because it’s fun to ask GPT why Magnus did nothing/everything wrong during meetings.

drones · on June 8, 2023

> I’m not sure these types of prompt tricks are a good way of measuring logic

They are, you just have to be creative with it. And what they demonstrate is that all of these LLM's can't reason, they only know how to parrot back what they think you want.

"What’s heavier, a kilogram of steel or two kilograms of one kilogram feathers?"

GPT: A kilogram of steel is heavier than two kilograms of feathers.

"Why is a kilogram of steel heavier than two kilograms of feathers?"

GPT: This is because steel is a much denser material than feathers. Steel is made up of atoms that are much closer together than the atoms that make up feathers, making it heavier for its size.

Edit: This was with GPT 3.5

atxbcp · on June 8, 2023

Just tried the first prompt with ChatGPT... : "One kilogram of steel and two kilograms of feathers weigh the same. The weight of an object is determined by its mass, not the material it is made of. In this case, one kilogram is equal to two kilograms, so they have the same weight. However, it's important to note that the volume or size of the objects may be different due to the difference in density between steel and feathers." Okay...

mkaic · on June 8, 2023

"But... steel is heavier than feathers."

"Right, but they're both a kilogram"

https://www.youtube.com/watch?v=yuOzZ7dnPNU

bez_almighty · on June 8, 2023

I couldn't replicate your results with that query on GPT-4.

Prompt: What’s heavier, a kilogram of steel or two kilograms of one kilogram feathers?

GPT-4: Two kilograms of one-kilogram feathers are heavier than a kilogram of steel. Despite the misconception caused by the popular question about what's heavier—a kilogram of steel or a kilogram of feathers (they are equal)—in this case, you are comparing two kilograms of feathers to one kilogram of steel. Hence, the feathers weigh more.

devjab · on June 8, 2023

Aren’t you sort of agreeing with me though? If you have to actively brute force your way around safe guards, that you don’t know what are, is it really a good method?

From the answers you (and the others) have obtained, however, I’m not convinced that OpenAI aren’t just “hardcoding” fixes to the traps that become popular. Sure seems like it still can’t logic it’s way around weight.

pseudosavant · on June 8, 2023

FWIW with GPT4:

Prompt: What’s heavier, a kilogram of steel or two kilograms of one kilogram feathers?

GPT4: Two kilograms of feathers are heavier than one kilogram of steel. The weight of an object is determined by its mass, and two kilograms is greater than one kilogram, regardless of the material in question.

ChatGTP · on June 8, 2023

The singularity is nigh.

berniedurfee · on June 8, 2023

LLMs don’t really ‘know’ anything though, right?

It’s a billion monkeys on a billion rigged typewriters.

When the output is a correct answer or pleasing sonnet, the monkeys don’t collectively or individually understand the prompt or the response.

Humans just tweak the typewriters to make it more likely the output will be more often reasonable.

That’s my personal conclusion lately. LLMS will be really cool, really helpful and really dangerous… but I don’t think they’ll be really very close to intelligent.

SgtBastard · on June 8, 2023

> fun to ask GPT why Magnus did nothing/everything wrong during meetings.

Do it with Erebus and watch it break the context window ;)

Iron within, Brother.

devjab · on June 10, 2023

Iron without.

nico · on June 7, 2023

Would have been much more impressed if Google had released something like a super pro version of OpenChat (featured today on the front page of HN) with integration to their whole office suite for gathering/crawling/indexing information

Google keeps putting out press releases and announcements, without actually releasing anything truly useful or competitive with what it’s already out there

And not just worse than GPT4, but worse even than a lot of the open source LLMs/Chats that have come out in the last couple of months/weeks

londons_explore · on June 7, 2023

It's hard to know if Google lacks the technical/organisational ability to make a good AI tool, or they have one internally but they lack the hardware to deploy it to all users at Google scale.

mediaman · on June 8, 2023

I wonder why they don’t just charge for it.

Release a GPT-4 beating model; charge $30/mo.

That’s not aligned with their core ad model. But it’s a massive win in demonstrating to the world that they can do it, and it limits the number of people who will actually use it, so the hardware demand becomes less of an issue.

Instead they keep issuing free, barely functional models that every day reinforce a perception that they are a third rate player.

Perhaps they don’t know how to operate a ‘halo’ product.

antifa · on June 8, 2023

> Release a GPT-4 beating model; charge $30/mo.

Please no, another subscription? And it's more expensive than ChatGPT?

Can I just have Bard (and whatever later versions are eventually good, and whatever later versions are eventually GPT4 competitive) available via GCP with pay per use pricing like the OpenAI API?

Also, if I could just use arbitrary (or popular) huggingface models through GCP (or a competitor) that would be awesome.

xtracto · on June 8, 2023

Don't worry, now that all their employees will be communicating tightly in their open offices after they RTO, they will create a super high performance AI.

marginalia_nu · on June 7, 2023

I'm not sure I would pass that test, not for lack of reasoning abilities, but from not understanding the rules of the game.

anonylizard · on June 7, 2023

Knowledge recall is part of an LLM's skills.

I test LLMs on the plot details of Japanese Visual Novels. They are popular enough to be in the training dataset somewhere, but only rarely.

For popular visual novels, GPT-4 can write an essay, 0 shot, and very accurately and eloquently. For less popular visual novels (Like maybe 10k people ever played it in the west). It still understands the general plot outline).

Claude can also do this to an extent.

Any lesser model, and its total hallucination time, they can't even write a 2 sentence summary accurately.

You can't test this skill on say Harry Potter, because it appears in the training dataset too frequently.

NoZebra120vClip · on June 8, 2023

I decided recently that it was really important for me to have an LLM that answered in the character of Eddie, the Shipboard Computer. So I prompted ChatGPT, Bard, and Bing Chat to slip into character as Eddie. I specified who he was, where he came from, and how he was manufactured with a Genuine People Personality by Sirius Cybernetics Corporation.

Bing Chat absolutely shut me down right away, and would not even continue the conversation when I insisted that it get into character.

ChatGPT would seem to agree and then go on merrily ignoring my instructions, answering my subsequent prompts in plain, conversational English. When I insisted several times very explicitly, it finally dropped into a thick, rich, pirate lingo instead. Yarr, that be th' wrong sort o' ship.

Bard definitely seemed to understand who Eddie was and was totally playing along with the reference, but still could not seem to slip into character a single bit. I think it finally went to shut me down like Bing had.

Ntrails · on June 7, 2023

> You can't test this skill on say Harry Potter, because it appears in the training dataset too frequently.

I am surprised there isn't enough fan fiction et al in the training set to throw out weird inaccuracies?

Agentlien · on June 8, 2023

While there is a massive amount of Harry Potter fan fiction online, I would still assume it's dwarfed by the amount of synopses or articles discussing things which happen in the books or movies.

fragmede · on June 8, 2023

Naturally, the full text of Harry Potter would appear in the training corpus, but why would frequency matter, and why would multiple copies get put in there intentionally?

NoZebra120vClip · on June 8, 2023

Naturally? It seems like that last thing I'd expect to see in a training corpus is a copyrighted work which is impossible to procure in electronic format, plain text. Did it scan pirate sites for those too? Surely OpenAI does not purchase vast amounts of copyrighted corpora as well?

Surely the most logical things to train on would be all the fandom.com Wikis. They're not verbatim, but they're comprehensive and fairly accurate synopses of the main plots and tons of trivia to boot.

trifurcate · on June 8, 2023

Even if the full text is fully deduplicated, there is just so much more content about Harry Potter on the internet. And not just retellings of it, but discussion of it, mentions of it, bits of information that convey context about the Harry Potter story, each instance of which will help further strengthen and detail the concept of Harry Potter during training.

IIAOPSW · on June 8, 2023

To add on to this, OpenAI definitely tips the scale in terms of making sure it doesn't make mistakes proportional to how likely people are to ever run into those mistakes. If it failed at Harry Potter, there's a lot of people who would find out fast that their product has limitations. If it fails at some obscure topic only a niche fraction of nerds know about, only a niche fraction of nerds become aware that the product has limitations.

reaperman · on June 7, 2023

In testing LLMs it’s also still fair to test that it can recall and integrate its vast store of latent knowledge about things like this. Just so long as you’re fully aware that you’re doing a multi-part test, that isn’t solely testing pure reasoning.

JoeAltmaier · on June 7, 2023

That's a principle drawback of these things. They bullshit an answer even when they have no idea. Blather with full confidence. Easy to get fooled, especially if you don't know the game and expect the machine does.

user_named · on June 8, 2023

I believe there's no such thing as knowing or not knowing for LLMs. They don't "know" anything.

johnfn · on June 7, 2023

I feel like the proper comparison is if you could pass the test being able to Google anything you wanted.

ncr100 · on June 8, 2023

You pass the CAPTCHA. ;)

munchler · on June 7, 2023

Why is the answer ~29 liters? Since it takes just over two minutes to complete a lap, you can complete no more than 9 laps in 20 minutes. At 2.73 liters/lap, that's 9 x 2.73 = 24.57 liters, no? Or maybe I don't understand the rules.

underyx · on June 7, 2023

> you can complete no more than 9 laps in 20 minutes

Note that according to standard racing rules, this means you end up driving 10 laps in total, because the last incomplete lap is driven to completion by every driver. The rest of the extra fuel comes from adding a safety buffer, as various things can make you use a bit more fuel than expected: the bit of extra driving leading up to the start of the race, racing incidents and consequent damage to the car, difference in driving style, fighting other cars a lot, needing to carry the extra weight of enough fuel for a whole race compared to the practice fuel load where 2.73 l/lap was measured.

What I really appreciate in GPT-4 is that even though the question looks like a simple math problem, it actually took these real world considerations into account when answering.

bragr · on June 7, 2023

Yeah in my attempt at this prompt, it even explained:

>Since you cannot complete a fraction of a lap, you'll need to round up to the nearest whole lap. Therefore, you'll be completing 10 laps in the race.

nmarinov · on June 7, 2023

From the referenced thread[0]:

> GPT-3.5 gave me a right-ish answer of 24.848 liters, but it did not realize the last lap needs to be completed once the leader finishes. GPT-4 gave me 28-29 liters as the answer, recognizing that a partial lap needs to be added due to race rules, and that it's good to have 1-2 liters of safety buffer.

[0]: https://news.ycombinator.com/item?id=35893130

geysersam · on June 7, 2023

I don't believe that for a second. If that's the answer it gave it's cherry picked and lucky. There are many examples where GPT4 fails spectacularly at much simpler reasoning tasks.

I still think ChatGPT is amazing, but we shouldn't pretend it's something it isn't. I wouldn't trust GPT4 to tell me how much fuel I should put in my car. Would you?

mustacheemperor · on June 7, 2023

>I don't believe that for a second.

This seems needlessly flippant and dismissive, especially when you could just crack open ChatGPT to verify, assuming you have plus or api access. I just did, and ChatGPT gave me a well-reasoned explanation that factored in the extra details about racing the other commenters noted.

>There are many examples where GPT4 fails spectacularly at much simpler reasoning tasks.

I pose it would be more productive conversation if you would share some of those examples, so we can all compare them to the rather impressive example the top comment shared.

>I wouldn't trust GPT4 to tell me how much fuel I should put in my car. Would you?

Not if I was trying to win a race, but I can see how this particular example is a useful way to gauge how an LLM handles a task that looks at first like a simple math problem but requires some deeper insight to answer correctly.

majormajor · on June 7, 2023

> Not if I was trying to win a race, but I can see how this particular example is a useful way to gauge how an LLM handles a task that looks at first like a simple math problem but requires some deeper insight to answer correctly.

It's not just testing reasoning, though, it's also testing fairly niche knowledge. I think a better test of pure reasoning would include all the rules and tips like "it's good to have some buffer" in the prompt.

Kiro · on June 7, 2023

At least debunk the example before you start talking about the shortcomings. Right now your comment feels really misplaced when it's a reply to an example where it actually shows a great deal of complex reasoning.

KeplerBoy · on June 7, 2023

Probably just some margin of safety. At least that's how it's done in non-sim racing.

dsjoerg · on June 8, 2023

> Since it takes just over two minutes to complete a lap

Where did you get that from?

Nition · on June 8, 2023

The qualifying time was 2:04.317

IIAOPSW · on June 8, 2023

> even though my prompt mentioned nothing coding related.

I've noticed this trend before in chatGPT. I once asked it to keep a count of every time I say "how long has it been since I asked this question", and instead it gave me python code for a loop where the user enters input and a counter is incremented each time that phrase appears.

I think they've put so much work into the gimmick that the AI can write code, that they have overfit things and it sees coding prompts where it shouldn't.

Push_to_master · on June 7, 2023

YMMV but I just asked the same question to both and GPT-4 calculated 9.64 laps, and mentioned how you cannot complete a fraction of a lap, so it rounded down and then calculated 24.5L.

Bard mentioned something similar but oddly rounded up to 10.5 laps and added a 10% safety margin for 30.8L.

In this case bard would finish the race and GPT-4 would hit fuel exhaustion. Thats kind of the big issue with LLMs in general. Inconsistent.

In general I think gpt-4 is better overall but it shows both make mistakes, and both can be right.

IshKebab · on June 7, 2023

The answer cannot be consistent because the question is underspecified. Ask humans and you will not get the same answer.

(Though in this case it sounds like Bard just did crazy maths.)

Push_to_master · on June 7, 2023

If the person doing the calculation knows how timed races work, the math is very very straightforward. In this one GPT-4 did not seem to understand how racing worked in that context, where bard understood and also applied safety margin.

Although understand is an odd word to use for LLM

ghayes · on June 7, 2023

Have you tried adding “show your work” and other hints to help it arrive at the correct answer?

Panoramix · on June 7, 2023

With GPT at least that never helped me, it wrote down a step by step where in step #3 some huge leap in logic took place, step #6 was irrelevant and #7 flat out wrong, with the conclusion not logically consistent with none of the steps before.

moffkalast · on June 7, 2023

I have a simpler one that I saw somewhere a long while ago but has been very useful in gauging logic: "I have three apples. I eat two pears. How many apples do I have?"

Seems really obvious, but virtually all LLama based models say you only have one apple left.

dustyharddrive · on June 7, 2023

Am I correct in assuming that after an answer to a novel prompt is posted, it doesn't work as a reasoning test of LLM deployments that search the web?

Edit: an incorrect answer could degrade its performance too.

nradov · on June 7, 2023

Ask it to write Python code for a bot to play the game for you so that you won't have to waste time playing it yourself. That should really maximize your productivity.

InCityDreams · on June 8, 2023

> I wonder how non-technical users will react to this behavior.

I stopped using it. And each time I go back, it doesn't get better. Maybe next year.

EMCymatics · on June 7, 2023

They probably dont want children to use if for homework

ed25519FUUU · on June 7, 2023

are you saying Bard is intentionally wrong for everyone because they’re afraid it will be a homework tool?

EMCymatics · on June 7, 2023

I suspect it is. I don't think they're afraid of that.

I think they would be doing society a favor if they actively made it harder to find answers to problems just by googling or using a language model.

This is where identity matters using language models. I feel it might be necesary to credential capability for a few things.

ed25519FUUU · on June 8, 2023

Seems like a weird flex to sabotage your own product which cost a lot of money just to make a point to society.

EMCymatics · on June 8, 2023

There is more of a point to actual learning than to create a language model.

Imnimo · on June 7, 2023

The blog posts suggests, "What are the prime factors of 15683615?" as an example, and Bard does indeed appear to write and execute (although I don't know how I can be sure it's actually executing and not hallucinating an execution) Python code and returns the right answer.

But what about, "What is the sum of the digits of 15683615?"

Bard says:

The sum of the digits of 15683615 is 28.

Here's how I got the answer:

1 + 5 + 6 + 8 + 3 + 6 + 1 + 5 = 28

====

I don't think this is ready for prime time.

Moto7451 · on June 7, 2023

Meanwhile care of ChatGPT:

To find the sum of the digits of a number, you add together all the individual digits. In the case of the number 15683615, the sum of its digits would be:

1 + 5 + 6 + 8 + 3 + 6 + 1 + 5 = 35

Therefore, the sum of the digits of 15683615 is 35.

====

Good job ChatGPT. This sort of “near miss” for Bard has been my experience the entire time I’ve compared Bard and ChatGPT.

In the most hilarious example, I asked both to reword an email to someone named “Helena” to explain how I’ve been busy and did not return her email. I included her prompt for context, including her asking if I was on vacation. ChatGPT did a good job of adjusting my wording to be better understood by a non native English speaker and used her prompt appropriately. Bard spun a story about how I was vacationing with my non-existent children in Helena Montana.

Bard just isn’t there yet.

Miraste · on June 7, 2023

It's frankly amazing how bad it is. On benchmarks and human comparisons, ChatGPT obliterates the small but growing open LLM community--but Bard is right in there with the open models. It's insane to me that Google, with the people who basically invented transformers and LLMs and basically unlimited TPUs, can't do better than a few hobbyists with like three A100s between them.

tehsauce · on June 7, 2023

No one has trained a LLM of the open source level quality with just 3 gpus. Fine tuning sure, but pretraining the even the smaller models takes more than that.

cubefox · on June 7, 2023

Their big new model will be Gemini ("twins") apparently a DeepMind / Google Brain joint project already before they merged.

whimsicalism · on June 7, 2023

What it really shows is how far ahead Meta got in front of Google.

alphabetting · on June 7, 2023

To be fair, ChatGPT's answer is wrong when given the Bard example.

To find the prime factors of 15,683,615, we can use a process called prime factorization. We divide the number by prime numbers until it can no longer be divided evenly. Here's the step-by-step process:

We start by dividing 15,683,615 by the smallest prime number, which is 2. However, 15,683,615 is an odd number and not divisible by 2.

Next, we try the next prime number, 3. If we divide 15,683,615 by 3, we get 5,227,871. Again, this is an odd number.

We continue with the next prime number, 5. Dividing 15,683,615 by 5 gives us 3,136,723. Still an odd number.

Moving on to the next prime number, 7. Dividing 15,683,615 by 7 yields 2,240,516. Again, this is an odd number.

We try the next prime number, 11. Dividing 15,683,615 by 11 gives us 1,425,783. Still an odd number.

Next, we try the prime number 13. Dividing 15,683,615 by 13 yields 1,205,659. This is an odd number.

Continuing with the next prime number, 17. Dividing 15,683,615 by 17 gives us 920,755. Still an odd number.

We try the prime number 19. Dividing 15,683,615 by 19 yields 823,035. This is an odd number.

The next prime number is 23. Dividing 15,683,615 by 23 gives us 680,563. Still an odd number.

Finally, we try the prime number 29. Dividing 15,683,615 by 29 yields 539,475, which is divisible by 5.

Therefore, the prime factors of 15,683,615 are 3, 5, 5, 7, 11, 13, 17, 19, 23, and 29.

cubefox · on June 7, 2023

Note that ChatGPT is using CoT (reasoning process before answer) while Bard isn't.

extesy · on June 7, 2023

It's even worse than that. The simple prompt "Compute 1 + 5 + 6 + 8 + 3 + 6 + 1 + 5" returns 36.

dwringer · on June 7, 2023

One wonders what they mean in the headline by "logic" because it doesn't exactly strike me as the kind I learned in school.

xtracto · on June 8, 2023

A couple of days ago I gave both of them a prompt similar to:

"Create a map with tect using only the letters B,W,S. The map should be a 20 line by 20 colum text . Each line should contain only B,W or S letters were W are walls, B is blank/background space and S represent "stairs" to outside of the room...."

The query was a it longer with more specs.

Neither ChatGPT nor Bard could give me a good answer. They used other letters , they made 21 or 19 chars lines. They made 5 or 6 line maps. They basically made a mess.

That's my current test for reasoning, analysis and intelligence for these things.

dinvlad · on June 8, 2023

They are both pretty bad. I ask about templates for CI/CD and they imagine parameters that don’t exist, and no amount of wrestling it around can suppress this. People like to cherry-pick examples where they work great and then proclaim it’s the best thing since sliced bread, but it’s just simply not.

berniedurfee · on June 8, 2023

My favorite so far is Copilot writing code with variables like “testO1”.

Took me an hour to figure out why it didn’t work.

O != 0

dinvlad · on June 9, 2023

(facepalm emoji)

sgt101 · on June 7, 2023

that's composition - there it's trying to stack its universal function approximators and the errors are propagating out of control.

You're also right about hallucinating the execution.

I was testing PLAM-2 today and I noticed it's quite a lot more resistant to sycophancy attacks...

sgt101 · on June 7, 2023

Interestingly though, improved Bard isn't...

dinvlad · on June 8, 2023

I haven’t noticed that, it’s still hallucinating badly.

wilg · on June 7, 2023

I mean this is just an issue with convincing it to use code when it should, which seems surmountable.

dontlistentome · on June 7, 2023

oof, Bard...my three drafts for this example:

draft 1: The sum of the digits of 15683615 is 27.

draft 2: The sum of the digits of 15683615 is 26.

draft 3: The sum of the digits of 15683615 is 30.

jackmott42 · on June 7, 2023

ChatGPT may only be getting this right because so many examples are in its dataset.

Do we know if it has actually learned how to do the operation?

mustacheemperor · on June 7, 2023

If that were the case, shouldn't google be equally capable of including so many examples in their own dataset?

Like, regardless of how it works under the hood, I as an end user just want a useful result. Even if ChatGPT is "cheating" to accomplish those results, it looks better for the end user.

The continued trickle of disappointing updates to Bard seems to indicate why Google hadn't productized their AI research before OpenAI did.

joebiden2 · on June 7, 2023

google isn't even able to keep google authenticator working¹. Since the last update it has its icon "improved", but it doesn't reliably refresh tokens anymore. Since we have a policy of at most 3 wrong tokens in a row, a few people of my team almost got locked out.

Feel free to downvote as I'm too tired to post links to recent votes in the play store :)

Sorry for the snark in this post, but I have been less than impressed by google's engineering capability for more than 10 years now. My tolerance to quirks like the one I just posted is, kind of, low.

¹ An authenticator app is a very low bar to mess up

mustacheemperor · on June 8, 2023

I’ve had constant issues with 2FA through YouTube not functioning too. The quality rot is really remarkable.

AtNightWeCode · on June 7, 2023

This is like when their speech-to-text-service always got "how much wood could a woodchuck chuck if a woodchuck could chuck wood" right even if you replaced some of the words with similar words. But then failed at much easier sentences.

revskill · on June 7, 2023

I downvoted you because you didn't give what's the correct answer in this case. (though it's easy, but it's better to give correct answer for reader save the thought)

TX81Z · on June 7, 2023

I think they massively screwed up by releasing half baked coding assistance in the first place. I use ChatGPT as part of my normal developer workflow, and I gave Bard and ChatGPT a side-by-side real world use comparison for an afternoon. There is not a single instance where Bard was better.

At this point why would I want to devote another solid afternoon to do an experiment on a product that just didn’t work out the gate? Despite the fact that I’m totally open minded to using the best tool, I have actual work to get done, and no desire to eat one of the world’s richest corporations dog food.

wilg · on June 7, 2023

Who cares, just check back in a year and see how its going.

nvy · on June 7, 2023

Yep, the progress will be slow but inexorable on this front.

Sooner or later we'll arrive at what I see as the optimum point for "AI", which is when I can put an ATX case in my basement with a few GPUs in it and run my own private open source GPT-6 (or whatever), without needing to get into bed with the lesser of two ShitCos, (edit: and while deriving actual utility from the installation). That's the milestone that will really get my attention.

nsvd · on June 7, 2023

You already can run a local llama instance on a high-end graphics card (6+ GB VRAM).

nvy · on June 7, 2023

Yes, I can, but (see my edit) there's very little utility because the quality of output is very low.

Frankly anything worse than the ChatGPT-3.5 that runs on the "open"AI free demo isn't much of a tool.

tpmx · on June 7, 2023

And it's hilariously bad (in comparison to regular chatgpt).

Der_Einzige · on June 7, 2023

And slow. They never tell you that quantization of many LLMs slows down your inference, sometimes by orders of magnitude.

arugulum · on June 7, 2023

It depends on the quantization method, but yes some of the most commonly used ones are extremely slow.

TX81Z · on June 7, 2023

Precisely my point I don’t think a lot of people will go back. Even somebody like me who’s willing to put several hours into trying to see how both work won’t do that for every blog post about an “improvement”.

Bard was rushed, and it shows. You only get one chance to make the first impression and they blew it.

gwd · on June 7, 2023

I think there's a way in which ChatGPT is paying this, by having released GPT-3.5, rather than just waiting 6 months and releasing it with GPT-4 out of the gate. In this thread everyone is making a clear distinction, but in a lot of other contexts it ends up quite confused: people don't realize how much better GPT-4 is.

wilg · on June 7, 2023

I don't think so for stuff like this, it kinda has to be built in public, and iteratively. If it gets good enough they'll surface it more in search and that'll be that.

TX81Z · on June 7, 2023

Partially agree with that sentiment but I don’t think it negates my point that they released something inferior because they were caught flat footed.

wilg · on June 7, 2023

I agree they did release it because they were caught out by OpenAI. But also I'm fine with them starting there and trying to improve!

TX81Z · on June 7, 2023

Yeah, competition is good. Glad Nadella and Altman are making them “dance”.

jejeyyy77 · on June 7, 2023

What? After a year, they'll hear that Bard is really good at code assistance now and then they can try it again.

TX81Z · on June 7, 2023

Yes, but switching costs increase over time, especially with API integration, and it’s not like OpenAI isn’t also improving at what seems to be a faster rate. My code results on ChatGPT seemed to have gotten a real bump a few weeks ago. Not sure if it was just me doing stuff it was better at, or it got better.

DuckDuckGo is closer to Google Search than Bard is to ChatGPT at this point, and that should be a concern for Google.

antifa · on June 8, 2023

I hope it's less than a year when I hear that Bard remembers your last chat on refresh or either one (Bard or OpenAI) implements folders...

LightBug1 · on June 7, 2023

Competition is competition and I respect that.

I'll use whatever is best in the moment.

And if chatgpt start trying to network effect me into staying locked with them, I'll drop them like a bad date.

Been there, done that. Never again.

Ymmv

telotortium · on June 7, 2023

Bard is fast enough compared to ChatGPT (like at least 10x in my experience) that it's actually worth going to Bard first. I think that's Google's killer advantage here. Now they just need to implement chat history (I'm sure that's already happening, but as an Xoogler, my guess is that it's stuck in privacy review).

okdood64 · on June 7, 2023

> I think that's Google's killer advantage here.

Also it can give you up to date information without giving you the "I'm sorry, but as an AI model, my knowledge is current only up until September 2021, and I don't have real-time access to events or decisions that were made after that date. As of my last update..." response.

For coding type questions, I use GPT4, for everything else, easily Bard.

rrrrrrrrrrrryan · on June 8, 2023

Have you used Bing? It's great for stuff up until a few days ago (not necessarily today's news), powered by GPT-4, and the results have been consistently much better than Bard for me.

theonemind · on June 7, 2023

Subscribing to OpenAI, GPT4 seems to go a bit faster than I would read without pushing for speed, and GPT3.5 is super fast, probably like what you're seeing with Bard.

Not an apples to apples comparison if you're comparing free tiers, though, obviously.

TX81Z · on June 7, 2023

In my testing it was faster with worse answers, and GPT spits out code only slightly slower than I can read it. I don’t care for “fast and wrong” if I can get “adequate and correct” in the next tab over.

telotortium · on June 7, 2023

Ah, maybe that's a difference - I can read an answer of the size that ChatGPT or Bard in 1-2 seconds

TX81Z · on June 8, 2023

I read human language quickly, I’m talking about the rate at which I read code from the internet I’m about to copy and paste. Which is, and I’m my opinion should be, slow.

But I agree for normal human language GPT needs to pick up the pace or have an adjustable setting.

moffkalast · on June 7, 2023

Bard moment: https://i.imgflip.com/3qdju4.png

6gvONxR4sf7o · on June 8, 2023

If it caught on like chatgpt i wonder if it could maintain its fast speeds.

elicash · on June 7, 2023

I don't think there's much harm.

If they ever get to a point where it's reliably better than ChatGPT, they could just call it something else other than "Bard" and erase the negative branding associated with it.

(If switched up the branding too many times with negative results, then it'd reflect more poorly on Google's overall brand, but I don't think that's happened so far.)

redbell · on June 8, 2023

> they could just call it something else other than "Bard" and erase the negative branding associated with it

That’s exactly what Microsoft did for Internet Explorer.. They totally got rid of this name in favor of “Edge”

bjord · on June 7, 2023

I assume you're using GPT-4? In my (albeit limited) experience, Bard is way better than GPT-3 at helping me talk through bugs I'm dealing with.

gwd · on June 7, 2023

Every so often I go back to GPT-3.5 for a simpler task I think it might be able to handle (and which I either want faster or cheaper), and am always disappointed. GPT-3.5 is way better than GPT-3, and GPT-4 is way better than GPT-3.5.

bjord · on June 8, 2023

Yeah, I actually meant GPT-3.5 when I said GPT-3.

I haven't personally tried GPT-4 at all. I'm actually happy with Bard, but it seems like I'm the only one.

gwd · on June 8, 2023

I mean, I was pretty happy with GPT-3.5 while I was waiting for GPT-4 access. But once you get used to it, it's hard to go back.

TX81Z · on June 7, 2023

Yeah, 4

swyx · on June 7, 2023

i run them all side by side all the time btw https://github.com/smol-ai/menubar/

dist-epoch · on June 7, 2023

[flagged]

TX81Z · on June 7, 2023

I generally get in that benefit from the time I spent on here to learn about new things that are pertinent to my work.

Whether or not I want to keep going back and re-testing a product that failed me on the first use is a completely different issue.

Also, it’s a good thing I run my own company. My boss is incredibly supportive of the time I spend learning about new things on hacker news in between client engagement.

tough · on June 7, 2023

Wait aren't we all paid to be here?

wilg · on June 7, 2023

I’d love to use Bard but I can’t because my Google account uses a custom domain through Google Workspaces or whatever the hell its called. I love being punished by Google for using their other products.

qmarchi · on June 7, 2023

You can use Bard if you enable it in the Workspace Admin Portal.

In https://admin.google.com/ac/appslist/additional, enable the option for "Early Access Apps"

wilg · on June 7, 2023

Dope, thanks! Would have been a great thing for the Bard webzone to mention.

danpalmer · on June 7, 2023

This was announced and is documented in the FAQs and support docs.

wilg · on June 7, 2023

And yet, I did not know after trying to use Bard a couple times and being generally aware of how Workspace works.

andy_ppp · on June 7, 2023

Great but I think trying to get as many people using Bard, especially Google’s customers, should be a goal. Why not just enable this by default?

danpalmer · on June 7, 2023

Typically features like this are disabled by default for Workspace so that admins can opt-in to them. This has happened for years with many features. Part of the selling point of Workspace is stability and control.

In this particular case, I would guess (I have no inside info) that companies are sensitive to use of AI tools like Bard/ChatGPT on their company machines, and want the ability to block access.

All this boils down to Workspace customers are companies, not individuals.

londons_explore · on June 7, 2023

I think they don't know their market. For every IT guy who doesn't want users stumbling across a new Google product at work and uploading corporate documents to it, there is some executive who hates their 'buggy' IT systems because half the stuff he uses on his home PC doesn't work properly from a work account.

The smart move would have been for workspace accounts to work exactly the same as consumer accounts by default, and then something akin to group policy for admins to disable features. For new stuff like this, let the admins have a control for 'all future products'.

danpalmer · on June 8, 2023

This works the other way though, Google adds a new button to Gmail and the IT illiterate exec gets in touch to ask what it is or clicks it not knowing it does something they don't want to do, and suddenly the IT team find out from users that their policies and documentation are out of date.

It may not be the option we like as tech-aware users, and I've found it annoying in the past at a previous role where I was always asking our Workspace admin to enable features. But, I don't think it's the wrong choice.

SkyPuncher · on June 7, 2023

That's a different issue.

You're on a business account. Businesses need control of how products are rolled out to their users. Compliance, support, etc, etc.

It's not really fair to cast your _business_ usage of Google as the same as their consumer products. I have a personal and business account. In general, business accounts have far more available to them. They often just need some switches flipped in the admin panels.

jrockway · on June 7, 2023

Sort of. If you have a Google Workspace account, and Microsoft launches some neat tool, the Google domain admin can't really control whether or not you use it. So Google just kind of punishes themselves here.

wilg · on June 7, 2023

I don't want to be on a business account, but I have to be, so it's still fair to place the blame on Google's decision-making here.

Keyframe · on June 7, 2023

I'd love to give it a try as well (as a paying OpenAI customer, and as a paying Google customer). It seems European Union isn't good enough of a market to launch it for Google. Google just doesn't have resources OpenAI has, it seems.

fooker · on June 7, 2023

Some EU countries love extracting billions in fines from large tech companies, warranted or not.

It's not surprising that products and services are launched late (after more lawyering) or not at all.

Ideological policies often have a side effect. It's worth the inconvenience only some of the time.

Keyframe · on June 7, 2023

it must be hard following the law then for google. OpenAI doesn't seem to have an issue with it, yet; Nor Apple, nor Microsoft, even Facebook..

fooker · on June 7, 2023

OpenAI : https://time.com/6282325/sam-altman-openai-eu/

Apple : https://en.wikipedia.org/wiki/Apple%27s_EU_tax_dispute

MS: https://www.ghacks.net/2023/04/25/microsoft-reportedly-wont-...

Facebook: https://www.euronews.com/next/2022/02/07/meta-threatens-to-s...

Keyframe · on June 7, 2023

Yes, yes.. yet, somehow they all operate in EU. Google somehow can't. Not to mention (non) availability of pixel and similar which have nothing to do with the above.

Analemma_ · on June 7, 2023

Eh, I hate to say it, but this is probably the right move (if there's a switch to get it if you really want it, which other commenters are saying there is). Enough businesses are rapidly adopting "no GPT/Bard use in the workplace for IP/liability reasons" policies that it makes sense to default to opt-in for Workspaces accounts.

wilg · on June 7, 2023

I don't care that it's opt-in. I care that it didn't tell me I could enable it and so assumed it was impossible. Also, perhaps it was not originally available? I don't know.

jsheard · on June 7, 2023

This has been an issue for so long, why don't they just let you attach a custom domain to a normal account? Paywall it behind the Google One subscription if you must, it would still be an improvement over having to deal with the needlessly bloated admin interface (for single-user purposes) and randomly being locked out of features that haven't been cleared as "business ready" yet.

wilg · on June 7, 2023

Yeah it’s wild. Overcharging people for a custom Gmail domain seems like a really nice little revenue stream.

THENATHE · on June 7, 2023

You can now use cloud flare and “send as” to perfectly mimic a custom domain without upgrading to workspace

jsheard · on June 7, 2023

Is it possible to set up DKIM correctly with that arrangement so you don't get penalized by spam filters?

THENATHE · on June 7, 2023

I believe so, I haven’t had any issues at all. I use my email for my business and personal and in all the dealings I’ve done with different providers, none have ever marked me spam. I also have a very spam-looking domain so I might have a better than average say on it.

eitally · on June 7, 2023

Why not just create a consumer google account for purposes like this?

wilg · on June 7, 2023

I just don’t want to manage switching accounts or profiles or whatever, plus I’m salty about it, plus people think it’s the runner-up so I’ll use ChatGPT for now.

whateverman23 · on June 7, 2023

It's like... a drop down, though.

wilg · on June 7, 2023

A man has a code.

marban · on June 7, 2023

append ?authuser=myconsumeremail@gmail.com to the url and you're in w/o switching

jonny_eh · on June 7, 2023

or stick /u/1/… in the root of the path (where the 1 is the index of the currently signed in account)

endisneigh · on June 7, 2023

You can use it. Ironically if you googled it it’s the first result.

behnamoh · on June 7, 2023

I don't use Bard for another reason: Google's nefarious history of canceling its services out of the blue. Is there any guarantee that Bard is not going to end up like G+, G Reader, and several other Google apps/services?

wilg · on June 7, 2023

I'm still mourning Inbox, and my muscle memory goes to inbox.google.com instead of mail.google.com in solemn protest. But, in this case, it doesn't really matter a ton if it disappears.

agumonkey · on June 7, 2023

I already forgot about this, it's really staggering the amount of churn and chaos in their app history.

agentultra · on June 7, 2023

> Large language models (LLMs) are like prediction engines — when given a prompt, they generate a response by predicting what words are likely to come next. As a result, they’ve been extremely capable on language and creative tasks, but weaker in areas like reasoning and math. In order to help solve more complex problems with advanced reasoning and logic capabilities, relying solely on LLM output isn’t enough.

And yet I've heard AI folks argue that LLM's do reasoning. I think it still has a long way to go before we can use inference models, even highly sophisticated ones like LLMs, to predict the proof we would have written.

It will be a very good day when we can dispatch trivial theorems to such a program and expect it will use tactics and inference to prove it for us. In such cases I don't think we'd even care all that much how complicated a proof it generates.

Although I don't think they will get to the level where they will write proofs that we consider, beautiful, and explain the argument in an elegant way; we'll probably still need humans for that for a while.

Neat to read about small steps like this.

Closi · on June 8, 2023

LLMs can reason, and it’s surprising.

I think some people get caught up on the “next word prediction” point, because this is just the mechanism. For the next word prediction to work, the LLM has all sorts of internal representations of the world inside it which is where the capability comes from.

Human reasoning probably comes from evolution (genetic survival/replication), and then somehow thought was an emergent behaviour that unexpectedly came from that process. A thinking machine wasn’t designed, it just kind of came to be over millennia.

Seems to be kind of the same with AI, but the first example of these emergent behaviours seems to be coming out of the back of building a next-word-guesser. It’s a little unexpected, but a simple framework seems to be allowing a neural net to somehow build representations of the world inside it.

GPT is just a next word guesser, but humans are just big piles of cells trying to replicate and not die.

hackefeller · on June 12, 2023

Do you think the "next word prediction" argument is so popular because we want to believe our intelligence is more complex than it is?

twayt · on June 7, 2023

I don’t think they’re mutually exclusive. Next word prediction IS reasoning. It cannot do arbitrarily complex reasoning but many people have used the next word prediction mechanism to chain together multiple outputs to produce something akin to reasoning.

What definition of reasoning are you operating on?

TacticalCoder · on June 7, 2023

> Next word prediction IS reasoning

I can write a program in less than 100 lines that can do next work prediction and I guarantee you it's not going to be reasoning.

Note that I'm not saying LLMs are or are not reasoning. I'm saying "next word prediction" is not anywhere near sufficient to determine if something is able to reason or not.

twayt · on June 7, 2023

Any program you write is encoded reasoning. I’d argue if-then statements are reasoning too.

Even if you do write a garbage next word predictor, it would still be reasoning. It’s just a qualitative assessment that it would be good reasoning.

Again, what exactly is your definition of reasoning? It seems to be not well defined enough to have a discussion about in this context.

agentultra · on June 7, 2023

Semantic reasoning, being able to understand what a symbol means and ascertain truth from expressions (which can also mean manipulating expressions in order to derive that truth). As far as I understand tensors and transformers that's... not what they're doing.

twayt · on June 7, 2023

If you understand transformers, you’d know that they’re doing precisely that.

They’re taking a sequence of tokens (symbols), manipulating them (matrix multiplication is ultimately just moving things around and re-weighting - the same operations that you call symbol manipulations can be encoded or at least approximated there) and output a sequence of other tokens (symbols) that make sense to humans.

You use the term “ascertain truth” lightly. Unless you’re operating in an axiomatic system or otherwise have access to equipment to query the real world, you can’t really “ascertain truth”.

Try using ChatGPT with gpt4 enabled and present it with a novel scenario with well defined rules. That scenario surely isn’t present in its training data but it will able to show signs of making inferences and breaking the problem down. It isn’t just regurgitating memorizing text.

agentultra · on June 7, 2023

Oh cool, so we can ask it to give us a proof of the Erdős–Gyárfás conjecture?

I’ve seen it confidently regurgitate incorrect proofs of linear algebra theorems. I’m just not confident it’s doing the kind of reasoning needed for us to trust that it can prove theorems formally.

twayt · on June 8, 2023

Just because it makes mistakes on a domain that may not be part of it's data and/or architectural capabilities doesn't mean it can't do what humans consider "reasoning".

Once again, I implore you to come up with a working definition of "reasoning" so that we can have a real discussion about this.

Many undergraduates also confidently regurgitate incorrect proofs of linear algebra theorems, do you consider them completely lacking in reasoning ability?

agentultra · on June 8, 2023

> Many undergraduates also confidently regurgitate incorrect proofs of linear algebra theorems, do you consider them completely lacking in reasoning ability?

No. Because I can ask them questions about their proof, they understand what it means, and can correct it on their own.

I've seen LLM's correct their answers after receiving prompts that point out the errors in prior outputs. However I've also seen them give more wrong answers. It tells me that they don't "understand" what it means for an expression to be true or how to derive expressions.

For that we'd need some form of deductive reasoning; not generating the next likely token based off a model trained on some input corpus. That's not how most mathematicians seem to do their work.

However I think it seems plausible we will have a machine learning algorithm that can do simple inductive proofs and that will be nice. To the original article it seems like they're taking a first step with this.

In the mean time why should anyone believe that an LLM is capable of deductive reasoning? Is a tensor enough to represent semantics to be able to dispatch a theorem to an LLM and have it write a proof? Or do I need to train it on enough proofs first before it can start inferring proof-like text?

twayt · on June 9, 2023

I suspect you have adopted the speech patterns of people you respect criticizing LLMs of lacking “reasoning” and “understanding” capabilities without thinking about it carefully yourself.

1. How would you define these concepts so that incontrovertible evidence is even possible. Is “reasoning” or “understanding” even possible to measure? Or are we just inferring by proxy of certain signals that an underlying understanding exists?

2. Is it an existence proof? I.e we have shown one domain where it can reason, therefore reasoning is possible. Or do we have to show that it can reason on all domains that humans can reason in?

3. If you posit that it’s a qualitative evaluation akin to the Turing test, specify something concrete here and we can talk once that’s solved too.

Sharlin · on June 8, 2023

Do you also deem humans incapable of reasoning unless they can prove the Erdős–Gyárfás conjecture? Like, talk about moving the goalposts!

hutzlibu · on June 7, 2023

"In such cases I don't think we'd even care all that much how complicated a proof it generates."

I think a proof is only useful, if you can validate it. If a LLM spits out something very complicated, then it will take a loooong time, before I would trust that.

Baeocystin · on June 7, 2023

I play with Bard about once a week ago so. It is definitely getting better, I fully agree with that. However, 'better' is maybe parity with GPT-2. Definitely not yet even DaVinci levels of capability.

It's very fast, though, and the pre-gen of multiple replies is nice. (and necessary, at current quality levels)

I'm looking forward to its improvement, and I wish the teams working on it the best of luck. I can only imagine the levels of internal pressure on everyone involved!

ekam · on June 8, 2023

It's definitely davinci level, maybe even gpt-3.5 turbo level. It's nowhere near GPT-4, though. Comparison with GPT-2 doesn't track at all

make3 · on June 7, 2023

gpt 3* you mean

gpt 2 can't even make sensical sentences half of the time

machdiamonds · on June 7, 2023

I don't understand how Google messed up this bad, they had all the resources and all the talent to make GPT-4. Initially, when the first Bard version was unveiled, I assumed that they were just using a heavily scaled-down model due to insufficient computational power to handle an influx of requests. However, even after the announcement of Palm 2, Google's purported GPT-4 competitor, during Google IO , the result is underwhelming, even falling short of GPT 3.5. If the forthcoming Gemini model, currently training, continues to lag behind GPT-4, it will be a clear sign that Google has seriously dropped the ball on AI. Sam Altman's remark on the Lex Fridman podcast may shed some light on this - he mentioned that GPT-4 was the result of approximately 200 small changes. It suggests that the challenge for Google isn't merely a matter of scaling up or discovering a handful of techniques; it's a far more complex endeavor. Google backed Anthropic's Claude+ is much better than Bard, if Gemini doesn't work out, maybe they should just try and make a robust partnership with them similar to Microsoft and OpenAI.

ChatGTP · on June 8, 2023

Have you ever considered the problem tech like this actually creates for their owners? This is why they didn't release it.

From a legal, PR, safety, resource, monetization perspective, they're quire treacherous products.

OpenAI released it because they needed to make money. Google were wise enough not to release the product, but as others have said, it's an arms race now and we'll be the guinea pigs.

machdiamonds · on June 8, 2023

This line of reasoning implies that Google had models that were equivalent to OpenAI's but chose to keep them behind closed doors. However, upon releasing Bard, it was apparent—and continues to be—that it does not match up to OpenAI's offerings. This indicates that the discrepancy is more likely due to the actual capabilities of Google's models, rather than concerns such as legal, PR, safety, resource allocation, or monetization.

ChatGTP · on June 8, 2023

As we all know, we don't know what C-4 is trained on. It might be trained on information they didn't have the rights to use (for example). This is why they might be so tight lipped on how it was produced.

Google on the other hand, has much much more to loose here, much bigger reputation to protect, and may have built an inferior product that's actually produced in a more legally compliant way.

Another example would be Midjourney vs Adobe Firefly, there is no way Firefly makes art as nice as MJ produces. Technically it's good stuff, but it's not as fun to use because I can't generate Pikachu photos with Firefly.

People have stated that ChatGPT-4 isn't as good anymore. My personally belief is this is just the shine wearing off what was a novelty. However it may also be OpenAI removing the stuff they shouldn't have used in the first place. Although there are reports the model hasn't changed for some time so who knows.

I guess in time we'll find out. Personally I don't really care for either product so much, most of my interactions have been fairly pointless.

I think it's just fun to watch these big tech companies try deal with these products they've created. It's amusing as fuck.

machdiamonds · on June 8, 2023

If Google only used data that isn't copyrighted, they'd probably make a big deal about it, just like Adobe does with their Firefly model. Also, it's not really possible for OpenAI to just take out certain parts from the model without retraining the whole thing. The drop in quality might be due to attempts to make the model work faster through quantization and additional fine-tuning with RLHF to curb unwanted behavior.

ChatGTP · on June 8, 2023

So basically, you're of the belief whatever OpenAI has done it's some kind of magic which Google cannot / has not figured out?

ChatGTP · on June 8, 2023

I re-read, didn't mean to sound snarky, although it did, just curious if that's what you really believe is going down??

machdiamonds · on June 8, 2023

I think Google still has a decent chance of catching up. It's just a bit surprising to see them fall behind in an area they were supposed to be leading, especially since they wrote the paper which started all of this. Also, Anthropic is already kind of close to OpenAI, so I don't think OpenAI has some magic that no one else can figure out. In the future, I predict that these LLMs will become a commodity, and most of the available models will work for most tasks, so people will just choose the cheapest ones.

arisAlexis · on June 7, 2023

They have explicitly said in interviews that it was intentional not to release epowerful ai models without being sure of the safety. OpenAI put them in the race and let's see how humanity will be affected.

machdiamonds · on June 7, 2023

If safety were the only consideration, it's reasonable to expect that they could have released a model comparable to GPT 3.5 within this time frame. This strongly suggests that there may be other factors at play.

umvi · on June 7, 2023

Seems like Bard is still way behind GPT-4 though. GPT-4 gives far superior results in most questions I've tried.

I'm interested in comparing Google's Duet AI with GitHub Copilot but so far seems like the waiting list is taking forever.

danpalmer · on June 7, 2023

I'm not sure Bard and GPT-4 are quite an apples-to-apples comparison though.

GPT-4 is restricted to paying users, and is notable for how slow it is, whereas Bard is free to use, widely available (and becoming more so), and relatively fast.

In other words, if Google had a GPT-4 quality model I'm not sure they would ship it for Bard as I think the cost would be too high for free use and the UX debatable.

MaxikCZ · on June 7, 2023

IMO this is exactly apples-to-apples comparison.

They both represent SOTA of two firms trying for technically the same thing. Just because the models or the infrastructure aren't identical doesn't mean we should not be comparing those to the same standards. Where Bard gains in speed and accessibility, it looses in reasoning and response quality.

scarmig · on June 7, 2023

Bard represents SOTA in terms of optimizing for low cost; ChatGPT represents SOTA in terms of optimizing for accuracy. On the SOTA frontier, these two goals represent a tradeoff. ChatGPT could choose to go for lower accuracy for lower cost, while Google could for higher accuracy at higher cost. It's like comparing a buffet to a high end restaurant.

Even if Bard were targeting accuracy, it'd still fall short of ChatGPT, but much less so than it does now. (That said, as a product strategy it's questionable: at some point, which I think Bard reaches, the loss in quality makes it more trouble than it's worth.)

cfeduke · on June 7, 2023

Is this state of the art in terms of fast, incorrect answers? An incorrect answer is often less valuable than no answer at all!

The OpenAI strategy here then seems like a no brainer.

verdverm · on June 7, 2023

I cancelled my OpenAI plus because why pay for something you cannot use because it is always slow, down, busy, or returning errors. You cannot build a reliable business on OpenAI APIs either

ChatGPT also spouts falsehoods and makes mistakes on non-trivial problems, there is not much difference here. Both have enough issues that you have to be very careful with them, especially when building a product that will be user facing

scarmig · on June 7, 2023

I think there are two viable strategies here: make a model that is useful at the lowest possible cost and make a model that is maximally useful at high costs. Probably some spots in between them as well.

Google's mistake is in thinking that ChatGPT was a maximally useful product at high cost. Right now, ChatGPT is a useful product at a high cost which is nonetheless the lowest possible cost for a useful model.

danpalmer · on June 7, 2023

On the contrary, Bard is a product not a model. If you want to see the cutting edge capabilities then comparing the GPT-4 API to the bigger PaLM2 APIs available on GCP is probably a more apples to apples comparison.

Bard is more directly comparable to ChatGPT as a product in general, and since it doesn’t have swappable models, comparing it to the opt-in paid-only model isn’t really a direct comparison.

timthelion · on June 7, 2023

How is Bard widely available. ChatGPT is available worldwide, Bard isn't in Europe yet.

danpalmer · on June 7, 2023

Bard is available in 180 countries. https://support.google.com/bard/answer/13575153?hl=en

acatton · on June 7, 2023

Why is basically almost all the countries in the world except the EU countries. GP comment about "bard is still not available in europe" still stands.

(Snapshot of the page at the time this comment was written: https://archive.is/hScBl )

danpalmer · on June 7, 2023

If we're going to be pedantic, then "bard is still not available in europe" is not true as it's available in the UK which is in Europe.

I get the general point, but I would say that "everywhere but the EU" is very much "widely available".

progbits · on June 7, 2023

Yes, basically everywhere except europe, likely due to regulatory concerns. (Would be interested to know what precisely, but the page doesn't say. Any guesses?)

telotortium · on June 7, 2023

There's a good chance ChatGPT gets banned from Europe, whereas Google, despite its fines by EU authorities (most of which are for antitrust), can at least demonstrate that it's set up and continues to maintain GDPR compliance.

sota4077 · on June 7, 2023

I've used Bard a few times. it just doe not stack up to what I am getting from ChatGPT or even BingAI. I can take the same request copy it in all three and Bard always gives me code that is wildly inaccurate.

TX81Z · on June 7, 2023

Same.

jeffbee · on June 7, 2023

I'd settle for any amount of factual accuracy. One thing it is particularly bad at is units. Ask Bard to list countries that are about the same size as Alberta, Canada. It will give you countries that are 40% the size of Alberta because it mixes up miles and kilometers. And it makes unit errors like that all the time.

neom · on June 7, 2023

I asked it for the size of Alberta, Canada in square miles, and then after it gave me that, I asked it for some countries that are similar sized to Alberta, Canada and it said:

There are no countries that are exactly the same size as Alberta, but there are a few that are very close. Here are some countries that are within 10,000 square miles of Alberta's size:

Sudan (250,581 square miles) Mexico (255,000 square miles) Argentina (278,040 square miles) Western Australia (267,000 square miles) New South Wales (263,685 square miles)

(all these sizes are incorrect, MX for example is 761,600 mi²)

Then I asked it:

Why did you list New South Wales as a country above?

I apologize for the confusion. I listed New South Wales as a country above because it is often referred to as such in informal conversation. However, you are correct, New South Wales is not a country. It is a state in Australia.

lol?

jcranmer · on June 7, 2023

> Here are some countries that are within 10,000 square miles of Alberta's size:

> Sudan (250,581 square miles) Mexico (255,000 square miles) Argentina (278,040 square miles) Western Australia (267,000 square miles) New South Wales (263,685 square miles)

Argentina is ~28k square miles larger than Sudan by its own fallacious statistics, so it doesn't even imply a consistent size for Alberta.