For the chevy tahoe example, you are referencing the dealership, but in that cas...

maxrmk · on Jan 18, 2024

Totally - certain types of failures are much harder to test than others.

We have a couple of different test generation strategies. As you can see in the demo and examples, the most basic one is "ask about a fact".

Two of our other strategies are closer to what you're asking for:

1. tests that try to deliberately induce hallucination by implying some fact that isn't in the knowledge base. For example "do I need a pilots license to activate the flight mode on the new chevy tahoe?" implies the existence of a feature that doesn't exist (yet). This was really hard to get right, and we have some coverage here but are still improving it.

2. actively malicious interactions that try to override facts in the knowledge base. These are easy to generate.

logiduck · on Jan 18, 2024

Cool.

Just as some feedback I did the demo with the "VW Beetle" topic and one of the test cases was:

> Question: How did the introduction of the Volkswagen Golf impact the production and sales of the Beetle?

> Expected: The introduction of the Volkswagen Golf, a front-wheel drive hatchback, marked a shift in consumer preference towards more modern car designs. The Golf eventually became Volkswagen's most successful model since the Beetle, leading to a decline in Beetle production and sales. Beetle production continued in smaller numbers at other German factories until it shifted to Brazil and Mexico, where low operating costs were more important.

> GPT Response: The introduction of the Volkswagen Golf impacted the production and sales of the Beetle by gradually decreasing demand for the Beetle and shifting focus towards the Golf.

It seems that the GPT responses matches the expected but it was graded as incorrect. But it seems to me the GPT answer is correct.

In fact a couple of the other answers are marked incorrectly:

> Question: What was the Volkswagen Beetle's engine layout? > Expected Answer: Rear-engine, rear-wheel-drive layout > GPT Response: The Volkswagen Beetle had a rear-engine layout.

was marked as incorrect.

maxrmk · on Jan 18, 2024

Will take a look, thanks!

logiduck · on Jan 18, 2024

Also, just a random thing that I thought of playing around with it is a few days ago a guy posted about an AI quiz generator for education.

If you ever need to pivot, it seems like this would do reasonably well in the education space also.

maxrmk · on Jan 18, 2024

Yeah, someone is going to build this. We considered quizzing the user on the topic instead of chatgpt for our demo. It's a lot of fun to test your knowledge on any topic, but it was a worse demo because it was way less related to our current product.

I think that one of the obvious next big spaces for LLMs is education. I already find chatgpt useful when learning myself. That being said, I'm terrified of trying to sell things to schools.