More

wcbeard10 · on Feb 8, 2017

The funnel shape of the scatter plot immediately reminded me of an article on the insensitivity to sample size pitfall [0], which points out that you'll expect entities with smaller sample sizes to show up more often in the extremes because of the higher variance.

Looks like the tags with the biggest differences exemplify this pretty well.

[0]- http://dataremixed.com/2015/01/avoiding-data-pitfalls-part-2...

xb · on Feb 8, 2017

I also saw that triangle shaped plot and had the same thought. I read a great paper about this recently [0] with some of the same examples as the link in the parent, but going a little further in depth.

I originally got on this topic when reading Bayesian Methods for Hackers [1]. I am still hunting for a good method to correct/compensate for this when I am doing these types of comparisons in my own work.

[0] -http://faculty.cord.edu/andersod/MostDangerousEquation.pdf

[1] - https://github.com/CamDavidsonPilon/Probabilistic-Programmin...

awild · on Feb 8, 2017

When I was writing my thesis I wanted to correct for that as well, and weighted my data by the log of the sample size. This made intuitive sense to me, and both my advisors seemed to agree, though neither of us found compelling papers for this.

platz · on Feb 8, 2017

Is it really a 'sample', if they are reporting on the entirety of their data for a given period?

Is the question interpreted as extending to those not on stackoverflow, or is it a complete census of the 'population' of their data?

_pastel · on Feb 8, 2017

It really doesn't matter - at least, not for the statistical error the parent is talking about. The effect isn't related to whether we are sampling from a larger population of programmers.

Suppose there were no difference between the usage of each language, and people just program on the weekends vs weekdays with some probability independent of language. Then, if a language has lots of users, it will likely have close to the average weekend/weekday proportion. The fewer users the language has, the more likely that it has an uneven weekend/weekday proportion just by chance. And if you plot the weekend/weekday proportions vs. the number of users, you expect a funnel shape just like the one in the article.

Therefore, the plot in the article - by itself - provides no evidence that there is any difference between the usage of different programming languages.

smaddox · on Feb 8, 2017

I would argue it's a sort of (nonrandom) proxy sample in the sense that they're sampling a fraction of the people actually programming on the weekend.

platz · on Feb 8, 2017

Ok, so if we make sure we're only talking about:

    > "what languages tend to be **asked about** on weekends, as opposed to weekdays?"

and:

    > "explore differences between **questions that are posted** on weekdays and weekends."

as opposed to the article title:

    > "What Programming Languages Are **Used Most** on Weekends?"

(emphasis added), is the problem then resolved?

mhermher · on Feb 8, 2017

I think it's just that they are plotting sum vs ratio of two random variables. Try this (in R):

a <- runif(1000); b <- runif(1000); plot(a + b, log(a/b))

Also, they likely have a low-end cutoff (notice their x axis starts at 10^4. If you do the same to the above plot, you get even closer to that exact shape. Try:

plot(a + b, log(a/b), xlim=quantile(a + b, probs=c(0.2, 1)))

_pastel · on Feb 8, 2017

Note also their x axis is on a log scale, which makes the edges linear instead of curved. E.g.:

plot(a + b, log(a/b), xlim=c(1, 2), log="x")

mhermher · on Feb 8, 2017

ah, thanks. I missed that.

k__ · on Feb 8, 2017

So the take away of this post is basically SO is bad at statistics and not what people code on their weekends?

wyager · on Feb 8, 2017

It could also be that the most popular languages in the corporate world are a compromise somewhere in between enjoyable/exciting and horrible/boring. (My assumption is that a large weekend ratio correlates with enjoyable/exiting.)

The ratio of sample sizes in the OP also isn't that bad, and none of them are very small.

simonbyrne · on Feb 8, 2017

This is crying out for a funnel plot: https://www.theguardian.com/commentisfree/2011/oct/28/bad-sc...

viraptor · on Feb 8, 2017

Can you normalise data like that based on a confidence interval? Just rescaling the graph to unify them seems wrong, (it would answer something like "what do we think the distribution would look like if we distrusted the low end?") but maybe there's a better way?

le0n · on Feb 8, 2017

A confidence interval won't adjust the points (point estimates) but will give those points with a lower sample size wide confidence intervals (often covering zero).

Using an (empirical) Bayesian multilevel model can both attach uncertainty intervals to the point estimates and appropriately "shrink" the estimates towards zero at the low-sample-size end.

The latter is more directly interpretable, at the cost of slightly more complex modelling (/assumptions).

viraptor · on Feb 8, 2017

Thanks! I think the shrinking you mention is what I was trying to say :)

Looking for explanation of multilevel model, I found http://mc-stan.org/documentation/case-studies/radon.html which seems to do exactly that in "Partial pooling model". (see graph)

codesushi42 · on Feb 8, 2017

A confidence interval is not what you want since this isn't a normal distribution of values.

Instead you'd want to use a CDF that bins that values.

notfed · on Feb 8, 2017

Thank you for pointing this out.

wcbeard10 · on Jan 14, 2017

There seems to be a pretty wide spectrum of abstraction for DL tools, from writing by hand to Theano to Keras, for example. Will this course focus on any tool in particular?

dhruvp · on Jan 14, 2017

Hey @wcbeard10,

This course will focus primarily on TensorFlow. We're doing so because, at the moment, it's the most popular Deep Learning framework and provides you enough flexibility to explore some of the newer network architectures we focus on towards the end of the program.

olivercameron · on Jan 14, 2017

~80% focusses on TensorFlow, but we are working on Keras content in other Nanodegrees!

simonebrunozzi · on Jan 14, 2017

Nice to see you here on HN, Oliver :)

wcbeard10 · on Jan 13, 2017

I often find myself frustrated when an important decision is made with little explanation in politics or business by someone who I assume is intelligent. It may strike me as a bad decision, but I try to be charitable and assume they have a good reason. I've thought that they often don't offer good explanations because they feel too busy to take the time to communicate, they're just not good at communicating or they don't recognize its importance.

This post suggests they could also have a better perception of the possible risks that even effective communication could entail than I do.

wcbeard10 · on Jan 12, 2017

My wife and I have done some of the same coursework (for different degrees), me on campus and hers online. I actually would have preferred online. Rigor seemed equivalent, but I find the ability to pause and rewind lectures invaluable.

wcbeard10 · on Jan 12, 2017

On ios safari, holding down the refresh button in the url bar gives you the option to request the desktop site.

As kludgy as that is, I've found it a worthwhile workaround until this is fixed.

wcbeard10 · on Oct 31, 2016

> Georgia, which also has a $5,000 state subsidy

Sadly, this is no longer the case. They've actually gone in the opposite direction, imposing an additional $200 annual fee on EV owners

http://politics.blog.ajc.com/2015/06/24/georgias-electric-ve...

refurb · on Oct 31, 2016

How else will they make up the lost gasoline tax revenue?

wcbeard10 · on Nov 1, 2016

It's a tricky issue, and gas tax probably won't be enough going forward.

The EV fee is a bit disproportionate because of the limited range and fewer average miles driven, but there's no easy fix.

erikpukinskis · on Nov 1, 2016

In reduced costs for refugees, loss of arable land, disaster relief, and oil spills.

wcbeard10 · on Sept 9, 2016

I occasionally use probabilistic programming systems, and find this project fascinating, but I've been wondering for a while what the vision is.

Is it meant mainly as a research project/proof of concept (seems I've seen elsewhere that it's funded by the DARPA PPAML project), or is it intended to become a commonly used piece of software with a community like pymc3 and stan?

probinso · on Sept 9, 2016

The PPAML project yielded a lot of cool work, I think that many people are still figuring out the most impact use case. I believe that the most mature new production language is Figaro, out of Charles River. They have some good example projects and a introductory text for their language.

https://www.cra.com/technical-expertise/probabilistic-modeli...

https://www.amazon.com/Practical-Probabilistic-Programming-A...

That being said, I'm very happy that Hakaru is getting attention.

ccshan · on Sept 9, 2016

A good way to see the vision of Hakaru may be to search for "2016" on my Web page http://homes.soic.indiana.edu/ccshan/

Whether it is intended to become a commonly used piece of software depends on what you mean by "become" :)

wcbeard10 · on Aug 31, 2015

A good summary and commentary here

http://www.marginalrevolution.com/marginalrevolution/2015/08...

wcbeard10 · on May 1, 2015

This seems intuitive. Scott Adams expresses a similar idea as focussing on 'systems' rather than 'goals.' Failing to meet a goal leaves you with nothing if the goal is all you focus on, but what you learn from the system is more reusable.

http://blog.dilbert.com/post/102964992706/goals-vs-systems

wcbeard10 · on April 30, 2015

I have no idea about Zeppelin's lineage, but looks like there's also Spark Notebook https://github.com/andypetrella/spark-notebook which more closely resembles the IPython notebook. I'd love to hear an explanation of the differences between all of these notebooks.

Can't wait to see if the Jupyter split will contribute to a consolidation or proliferation in the notebook-verse...