Hacker News new | past | comments | ask | show | jobs | submit login

The data is pretty easily accessible as JSON - however, here's the data I used: http://pastebin.com/nRYv40U8

Firstly, a boxplot with the quotient of entertainment contributions to entertainment & internet contributions.

http://i.imgur.com/FWQWy.png

You can see quite easily that there's a difference which is also significant (95%, t = -4.73).

I've also done a logistic regression correcting with age, party (is_democrat), seniority and quota of contributions (quota_ent).

  ------------------------------------------------------------------------------
       support |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]  
  -------------+----------------------------------------------------------------
           age |   .0258551   .0358136     0.72   0.470    -.0443382    .0960485
   is_democrat |  -1.252883   .6243361    -2.01   0.045    -2.476559   -.0292067
     seniority |  -.0262688   .0381962    -0.69   0.492     -.101132    .0485943
     quota_ent |   5.839435   1.447732     4.03   0.000     3.001933    8.676938
         _cons |  -1.968467    2.01512    -0.98   0.329    -5.918029    1.981096
  ------------------------------------------------------------------------------
The AUC is 0.8089 which is quite okay. Furthermore, it would be interesting to test whether location is a significant factor.

Edit: @adamtaylor: Here's a scatter plot with each contribution, transformed with log(1 + x) for readability: http://i.imgur.com/MRciL.png




Hmm, that scatterplot suggests a less straightforward relationship; the for/against dots look like they plausibly came from slightly different distributions, but not very much different ones.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: