Hacker News new | past | comments | ask | show | jobs | submit login

Yeah. reminds me of the ancient okcupid data analysis blogs and not the creepy one by sleep8. The group I'm surprised not to see represented in their analysis is "personal", where people I know use ChatGPT as a therapist/life coach/sms analysis&editor. and of course they crucially but understandably left off the denominator. 35% of a million requests is different than 35% of a billion. and also how many of the conversations had 1 message, indicating "just testing" vs 10 or 100 messages.



> not the creepy one by sleep8

What are you referring to?


Oh I guess it was just a tweet, but still.

https://www.404media.co/ceo-reminds-everyone-eightsleep-pod-...


> 35% of a million requests is different than 35% of a billion.

Not statistically.


A mentor I respect memorably explained to young me that “it doesn’t matter how big the pot of soup, you can use the same size spoon to taste it.”


Sorry but that mentor has a small practical imagination, a pot can be so large that the top 3 feet that you reach with that spoon could be all oil


True! Consistency and representativeness matter, in soup samples as in social samples!

Is the soup smooth or lumpy? Striated or uniform? For that matter a soup could (and often does) involve huge soup bones that give it important parts of its flavor, but never show up directly in a spoonful. And you might need something different from a spoon to convincingly rule out some specific rare lumpy ingredient.

The didactic value of sampling the soup pot goes well behind its basic function: correcting the beginner’s misperception that a sample’s statistical power is directly related to population size :)


to push this analogy too far, that's because you didn't stir it well, not because the spoon is too small.


Have to sample to see if it’s stirred well enough.


No, you can model whether stirring actions should create a representative sample


Not with immiscible layered stratified flow…

“You're gonna need a bigger spoon!”


35% of a million students in the USA is very different to 35% of a billion students across the USA, Europe and Africa.

Since there aren't a billion students in the USA, 35% of them is an impossibility.

If you scale your population above some recognized boundary you aren't sampling in the same space any more. After all the local star density to 1AU tends very strongly to 1. That's not indicative of the actual star density in the milky way.


Yes statistically. What do you think "statistically" means?


What do you mean by “statistically”? The end results would be like three orders of magnitude apart. Wouldn’t the desired sample size depend on the size of the population itself?


>Wouldn’t the desired sample size depend on the size of the population itself?

No, The most important thing is the distribution of the sample size. You have to make sure it isn't obviously biased in some way (i.e You're only surveying students in a university for extrapolation on the entire population of the country). Beyond that, the desired sample size levels off quickly.

5000 (assuming the same distribution) won't be any more or less accurate for 10M than it is for 1M.

Of course, if you just ask everyone or almost everyone then you no longer need to worry about distribution but yeah




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: