Hacker News new | past | comments | ask | show | jobs | submit login

Is it really a 'sample', if they are reporting on the entirety of their data for a given period?

Is the question interpreted as extending to those not on stackoverflow, or is it a complete census of the 'population' of their data?




It really doesn't matter - at least, not for the statistical error the parent is talking about. The effect isn't related to whether we are sampling from a larger population of programmers.

Suppose there were no difference between the usage of each language, and people just program on the weekends vs weekdays with some probability independent of language. Then, if a language has lots of users, it will likely have close to the average weekend/weekday proportion. The fewer users the language has, the more likely that it has an uneven weekend/weekday proportion just by chance. And if you plot the weekend/weekday proportions vs. the number of users, you expect a funnel shape just like the one in the article.

Therefore, the plot in the article - by itself - provides no evidence that there is any difference between the usage of different programming languages.


I would argue it's a sort of (nonrandom) proxy sample in the sense that they're sampling a fraction of the people actually programming on the weekend.


Ok, so if we make sure we're only talking about:

    > "what languages tend to be **asked about** on weekends, as opposed to weekdays?" 
and:

    > "explore differences between **questions that are posted** on weekdays and weekends."
as opposed to the article title:

    > "What Programming Languages Are **Used Most** on Weekends?" 
(emphasis added), is the problem then resolved?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: