You're working from a different definition of "good comment" than Graham is. The concern with "bad" comments isn't simply that they exist, but that they have deceptively positive scores.
...which is a problem with a crystal clear cause: if you allow purely democratic voting, then you'll end up with really popular crap reaching the top of the pile fairly often. Imagine how useless Pagerank would be as an algorithm if it considered links from every drooler with a domain name to count for the same amount as those coming from the frontpage of Yahoo - that's essentially what you're doing when you tally up votes on a site The only reason it's not worse than it currently is is that there's no real incentive for people to game the system, so we don't end up with massive comment spammers. But we still have douchebags, fools, and all other sorts of sheeple, and in large enough numbers they end up diluting the votes that should count the most.
Several solutions are probably obvious to everyone here and I won't go into them, but the point is, you'd have to be willing to throw fairness aside in order to fix the problem - if a brand new account's vote is worth the same amount as mine, which is worth the same as tptacek's, then something is wrong.
The site can generate a lot of traffic to a submitted URL, so there is certainly a temptation to game getting a URL onto the front page.
I think that an informal oligarchy was established, certainly with respect to downvotes and flagging. The question is why is it breaking down. More troubling are the number of quality commenters who seem to have departed.
One hypothesis I have is that many on the leaderboard (and may of contributors I value who aren't on the leaderboard) don't seem to have any plans to apply to YC, they just want to take part in a community of hacker entrepreneurs. If more effort is not given to establishing a richer community model that is not managed as an adjunct to the accelerator they will probably continue to drift off.
But a more fundamental question is what we mean by 'post quality'. Right now, the vote total is the only objective metric of post quality; if we're trying to rethink the voting system itself, we need a lower-level set of objective quality criteria if we want evaluate how well a proposed ranking system works.
What are 'crap', 'douchebags', 'fools', and 'sheeple'? Different people have different thresholds for assigning these labels. Until a discussion board establishes quasi-official definitions of these terms, whether by consensus or by fiat, you can't use them to measure anything.
Your last suggestion seems to make things less democratic understandably, but how would it be implemented?
Weighted votes based on karma. Or restrict the number of votes a user can cast based on karma or account age.
A problem with these kinds of things is determining what does or does not work and picking up on unintended side-effects, and doing so in a timely manner.
For example, if "young" accounts have very limited votes available will new users be less likely to stick around long enough to become more active voters? How long would it take to see this? What if by the time you recognize an undesired side effect you've already sent the site down the road of ruin (or something)?
For example, how long will pg keep comment vote hidden? How many people have found this to be sufficiently detrimental that they have, or soon will, leave HN and not come back?
How do you craft meaningful site experiments like this while keeping risk to a minimum?
>> if "young" accounts have very limited votes available will new users be less likely to stick around long enough to become more active voters?
I suppose that depends on whether or not they are on the site to read submitted news articles and comments or to play the voting and karma game. I rarely vote (or comment for that matter), but I get value from the content. For that reason, I have continued to consistently visit the site for a couple of years.
Maybe all we need is 'bad, average, good, great' to be shown alongside the comment. Or something along the lines of what slashdot do with allowing you to vote for comments as 'insightful' or 'funny' etc.
On a hunch: Run Pagerank (the original thing, i.e. just the stable distribution of the random walk) on the graph of who-upvotes-whom. Then tally upvotes weighted with the upvoters pagerank. It's probably safe to assume that the pagerank of a given user changes rather slowly, so this shouldn't be a too big problem to actually compute (and needs to be update at most every few days). Also, to favour recent history a bit one could decay the wheight of older upvotes in the who-upvotes-whom graph.
Now that upvotes are non-public information, people can't know their own rank (at least not easily), so this looks at least halfway resistant to gaming.
Advogato has used an attack-resistant trust metric with similar features; Raph Levien called such systems attack-resistent trust metrics. I'm sure they would do the task you describe successfully. Check out:
Whoops, missed a lot of responses here, sorry about that!
Yes, vanilla Pagerank (where the "sites" are accounts and "links" are upvotes) is the first thing that came to my mind, mainly because it's been pretty well battle tested; running it on the comment graph is actually a much simpler problem than running it behind a search engine because you don't need the additional refinements based on search keyword. A comment's Pagerank is, by itself, enough to set an ordering.
If that's too much work, though, even a simple vote weighting based on some function of the upvoter's karma would be a decent approximation. Say every user's votes had an impact based on some sigmoid function of their karma (maybe a Gompertz function?), tuned so that the plateau is hit for the top 10% of users or something like that.
I've actually implemented a system like this on isdaily (a news site I'm building.)
I came to the conclusion that using Karma as a score is a rather bad idea as it assumes that people who get voted up have better judgement: "If you post well you must vote well."
But I'm sceptical that this is the case, it's easy to see someone like patio11 has both excellent judgement and a high karma score. But vote power = karma probably means you're giving a casting vote to patio11 if he turns up but mainly you're giving vote power to the people who post the most.
The implementation I have is basically:
"If you made hard decisions in the past and voted well, you get more vote power."
So in your implementation, you gain voting weight by voting on the posts that have themselves accumulated the highest vote counts?
In order for this to work, you'd have to hide vote totals on all posts and comments, which, as jamesbritt pointed out above, eliminates a feedback mechanism that users may find desirable.
I like the idea of upvotes and downvotes being multiplied by your avg score. tptacek's votes would be worth 8.85 at the moment, bermanoid's 3.5, and mine 2.92. Newbies would come in at 1.
That would be a violation of the democratic principle behind social news and would incentivize people to game their scores. Members have different interests and expertise, so having a high avg score doesn't mean they are experts in everything (Abuse of power - OMG this is getting more intricate than i thought)
You could take the log of any score > 1 to reduce the incentive to game the system.. that way someone who relentlessly karma whores won't have that much more influence than a mildly uprated commenter, but both will have more influence than a noob.
Well, with the situation we're in now, only three noobs (or one noob with three accounts) can outrank two wise elders. As to your gp comment, yes, incentivizing karma gains could be a problem, but that's the tradeoff that's got to be considered.
I'm not sure what the best tradeoff is; personally, I'd probably go with something like Pagerank for a while and see if the comment quality (above the fold, at least) improves, but perhaps pg has a desire to keep things more neutral and fair, I'm not sure.
I've been thinking lately about running some analysis of my own, but couldn't find a data dump and didn't feel like scraping. Why has the dump been pulled?
I'm thinking of classifying entries as temporal/atemporal. Temporal are industry news, press releases, rumors and alike. Atemporal are reference materials, essays, howtos, analysis, links to libraries/frameworks/products.
Essentially, I feel that while temporal entries provide a more immediate reward and larger discussions, atemporal tend to be more technical, more interesting, have more valuable discussions, and overall are what sets HN apart of other places. I'd rather see a front page full of erlang stories than pieces about what Apple has just announced (I'm pretty sure I'll see these in many different places).
A good balance is probably ideal: a few most interesting breaking stories, and the most thought-provoking atemporal materials users have ran into lately. The perceived decline of the front page and comment quality could be attributed to a shift in this balance towards temporal stories.
As to how to classify, I'd start as marking stories with multiple duplicates in a short period of time as temporal, and the ones with duplicates spread apart over long periods as atemporal, and go from there.
It's not obvious to me how to automatically classify stories as (a)temporal. My understanding of the way submissions work is that if Bob submits something that Alice also recently[1] submitted, then it just gets counted as an upvote for Alice's story.
Then, of course, there are the "this happened recently, which inspired me to think about some broader trend". For example, during the AWS outage, Coding Horror had a story about Netflix's Chaos Monkey[2], which one could easily classify as both temporal and atemporal.
I believe your understanding is right, recently being a month, provided it's the exact same URL. Popular news are usually submitted from different sources so there are actual duplicates. I agree that it's certainly not easy to automate the classification.
The Coding Horror example is a very good one. Jeff write very atemporal stories, I'm pretty sure that happens intentionally. It's not unusual to see one of his posts from several years ago pop in the front page every now and then. Same with Joel's. It may be inspired by a recent event, but the lesson is meant to stand.
I think a good heuristic would be: given a random, highly voted story from the archives, would it get upvoted if reposted today?
it may be that there are differing opinions about the quality of posts to HN in the early part of 2011. Writing then, he said, "The problem has several components: comments that are (a) mean and/or (b) dumb that (c) get massively upvoted."
In specific, I'd like to see information on the following in relation to the recent changes in HN:
- increase/decrease in activity of users with highest karma
- increase/decrease average in comment score, normalized
by time after post of OP
- amount of time the highest rated posts stayed on the front page
- trends for # of flags
Also, it would be great if he put the guidelines on the submission page.
So Hacker News has probably gotten worse if you like to read every single post, but better if you only like to read the front page.
It is hard to define what a "quality" article is and not sure if I agree with his assumptions, but I personally have seen exactly this over the past year compiling my weekly Hacker Newsletter.
I'd also be interested if there were any correlation between meta discussion and it's affects on trends. So I'd be curious to see ycombinator (and/or hacker news) in the "mentioned companies" section.
If coming up with an objective measure of quality was feasible, the voting system could be scrapped entirely, and instead of making a site that allows users to submit links, pg could have written a web crawler.
I'm just pointing out that it should come as no surprise that the author of this blog post didn't come up with a good metric for judging quality. As in, if the author had come up with such a metric, he would have started a company and become fantastically wealthy applying this magical quality-quantifying algorithm, rather than writing a blog post.
Perhaps. But they are degrees of usefulness, and it's probably possible to do better. Especially if you do an analysis in hindsight, where it's no longer possible for the participants to game the system.
Yeah, that measure was super hacky and I have no justification for it (but it was the simplest/only thing I could think of at the time). Do you have any suggestions?
One thing I'm thinking of now is seeing how (# points at the xth percentile) / (# points of 10th highest rated post) changes over time. (Or maybe I should just take a closer look at how the overall distribution of points changes to get a better feel first.)
There is no way to get a handle on comment and submission quality without content analysis.
You need a list of properties that define good comments and good submissions, you need some way to quantify those properties and then you need to invest obscene amounts of work coding those properties.
All of this is hard work and full of uncertainty. Different people want HN to be different things, there can’t be one list of properties that does them all justice. Who gets to define what a good comment or a good submission is? Is it even possible to exhaustively translate someone’s understanding of a good comment or a good submission into a list of properties? Can those properties be quantified, in the best case with as little work as possible? Does the coding process scale to the amount of content that has to be analyzed?
This is a Ph.D. thesis worth of work. I wouldn’t try to do it as a hobby.
An interesting analysis (which only pg could do) is to multiply each vote by ln(user's karma) or something similar. That could arguably deal with "noobs who don't follow the rules" problem.
The above analysis will assume, though, that the problem of decreasing quality of votes (or discussions) is caused by newcomers and not by old high karma users getting nastier.
> 2) ... But a circle's area increases as (pi)r^2. That's an exponential increase.
No, no, no, no, no. "exponential" != "has an exponent". Exponential means the variable is the exponent, as in 2^n. The word you're looking for is "quadratic".
It doesn't matter whether it's unfair by a factor of n^2 or by 2^n. If it's unfair, then it cannot be looked at whatsoever without drawing incorrect conclusions.
Or are you trying to educate me out of good will? If so, thanks.
Kind of the latter, really. While the true meaning was clear from context here, it's very easy to make the same mistake in other contexts without the true meaning being so clear. And, unlike a number of other common mistakes, this is one that can really bite you in the ass in some contexts, so it's good to set the record straight before your gluteals are masticated.
Standard deviation is the square root of variance. A variance of 4 is twice the variance as one of 2, but that's just by definition. What is perhaps more interesting for your intuition: Adding two independent variables, the variance of the result will be the sum of the variances. Thus the standard deviation of the sum is the squareroot of the sum of squares of the standard deviations of the components. (It's easier to read in formulas.)
By those definitions, shouldn't the number of pixels (circle area) reflect the deviation, rather than what it currently does (the square deviation, aka variance)?
In that case, the area should grow proportionally to the deviation, not to the variance.
Just to be clear, that's the number of users with at least one submission in a given month. (Not the number of users with at least one submission or comment, as I sleepily wrote originally, and definitely not the number of distinct visitors.)