The funnel shape of the scatter plot immediately reminded me of an article on the insensitivity to sample size pitfall [0], which points out that you'll expect entities with smaller sample sizes to show up more often in the extremes because of the higher variance.
Looks like the tags with the biggest differences exemplify this pretty well.
I also saw that triangle shaped plot and had the same thought. I read a great paper about this recently [0] with some of the same examples as the link in the parent, but going a little further in depth.
I originally got on this topic when reading Bayesian Methods for Hackers [1]. I am still hunting for a good method to correct/compensate for this when I am doing these types of comparisons in my own work.
When I was writing my thesis I wanted to correct for that as well, and weighted my data by the log of the sample size. This made intuitive sense to me, and both my advisors seemed to agree, though neither of us found compelling papers for this.
It really doesn't matter - at least, not for the statistical error the parent is talking about. The effect isn't related to whether we are sampling from a larger population of programmers.
Suppose there were no difference between the usage of each language, and people just program on the weekends vs weekdays with some probability independent of language. Then, if a language has lots of users, it will likely have close to the average weekend/weekday proportion. The fewer users the language has, the more likely that it has an uneven weekend/weekday proportion just by chance. And if you plot the weekend/weekday proportions vs. the number of users, you expect a funnel shape just like the one in the article.
Therefore, the plot in the article - by itself - provides no evidence that there is any difference between the usage of different programming languages.
I think it's just that they are plotting sum vs ratio of two random variables. Try this (in R):
a <- runif(1000); b <- runif(1000); plot(a + b, log(a/b))
Also, they likely have a low-end cutoff (notice their x axis starts at 10^4. If you do the same to the above plot, you get even closer to that exact shape. Try:
It could also be that the most popular languages in the corporate world are a compromise somewhere in between enjoyable/exciting and horrible/boring. (My assumption is that a large weekend ratio correlates with enjoyable/exiting.)
The ratio of sample sizes in the OP also isn't that bad, and none of them are very small.
Can you normalise data like that based on a confidence interval? Just rescaling the graph to unify them seems wrong, (it would answer something like "what do we think the distribution would look like if we distrusted the low end?") but maybe there's a better way?
A confidence interval won't adjust the points (point estimates) but will give those points with a lower sample size wide confidence intervals (often covering zero).
Using an (empirical) Bayesian multilevel model can both attach uncertainty intervals to the point estimates and appropriately "shrink" the estimates towards zero at the low-sample-size end.
The latter is more directly interpretable, at the cost of slightly more complex modelling (/assumptions).
There seems to be a pretty wide spectrum of abstraction for DL tools, from writing by hand to Theano to Keras, for example. Will this course focus on any tool in particular?
This course will focus primarily on TensorFlow. We're doing so because, at the moment, it's the most popular Deep Learning framework and provides you enough flexibility to explore some of the newer network architectures we focus on towards the end of the program.
I often find myself frustrated when an important decision is made with little explanation in politics or business by someone who I assume is intelligent. It may strike me as a bad decision, but I try to be charitable and assume they have a good reason. I've thought that they often don't offer good explanations because they feel too busy to take the time to communicate, they're just not good at communicating or they don't recognize its importance.
This post suggests they could also have a better perception of the possible risks that even effective communication could entail than I do.
My wife and I have done some of the same coursework (for different degrees), me on campus and hers online. I actually would have preferred online. Rigor seemed equivalent, but I find the ability to pause and rewind lectures invaluable.
I occasionally use probabilistic programming systems, and find this project fascinating, but I've been wondering for a while what the vision is.
Is it meant mainly as a research project/proof of concept (seems I've seen elsewhere that it's funded by the DARPA PPAML project), or is it intended to become a commonly used piece of software with a community like pymc3 and stan?
The PPAML project yielded a lot of cool work, I think that many people are still figuring out the most impact use case. I believe that the most mature new production language is Figaro, out of Charles River. They have some good example projects and a introductory text for their language.
This seems intuitive. Scott Adams expresses a similar idea as focussing on 'systems' rather than 'goals.' Failing to meet a goal leaves you with nothing if the goal is all you focus on, but what you learn from the system is more reusable.
I have no idea about Zeppelin's lineage, but looks like there's also Spark Notebook https://github.com/andypetrella/spark-notebook which more closely resembles the IPython notebook. I'd love to hear an explanation of the differences between all of these notebooks.
Can't wait to see if the Jupyter split will contribute to a consolidation or proliferation in the notebook-verse...
Looks like the tags with the biggest differences exemplify this pretty well.
[0]- http://dataremixed.com/2015/01/avoiding-data-pitfalls-part-2...