Hacker Newsnew | past | comments | ask | show | jobs | submit | thinmalk's commentslogin

We had a coffee shop that tried to do it. Listed prices included taxes, and the total prices were in nice whole numbers (IE, $2 for a cup of coffee, $5 for a latter, $8 for a sandwich, etc.). But regulators stopped them and they had to go back to listing the prices without the sales tax.

It's frustrating how much needless friction gets put into the system.


The Turing test isn't a good test in general, but writing a paper about an AI "passing" it when it only wrote 4 short messages in the whole conversation is almost farcical. Hard coded chatbots were "passing the Turing test" in the 90's with this standard.


The tests for AGI that keep getting made, including the ones in this paper, always feel like they're (probably unintentionally) constructed in a way that covers up AI's lack of cognitive versatility. AI functions much better when you do something like you see here, where you break down tasks into small restricted benchmarks and then see if they can perform well.

But when we say AGI, we want something that will function in the real world like a human would. We want to be able to say, "Here's 500 dollars. Take the car to get the materials, then build me a doghouse, then train my dog. Then go to the store, get the ingredients, and make dinner."

If the robotics aren't reliable enough to test that, then have it be a remote employee for 6 months. Not "have someone call up AI to wrote sections of code" - have a group of remote employees, make 10% AI, give them all the same jobs with the same responsibilities, and see if anyone notices a difference after 6 months. Give an AI an account on Upwork, and tell it to make money any way it can.

Of course, AI is nowhere near that level yet. So we're stuck manufacturing toy "AGI" benchmarks that current AI can at least have some success with. But these types of benchmarks only broadcast the fact that we know that current and near future AI would fail horribly at any actual AGI task we threw at it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: