Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Poll: What should be done about the endless repetition of stories?
428 points by ColinWright on July 29, 2011 | hide | past | favorite | 148 comments
One thing that bugs me about HN is the apparently endless repetition of stories. When something is interesting it gets taken up by several "sources" and then each of these is dutifully submitted by multiple people.

This has some undesirable consequences. One is that it dilutes the "newest" page. That I don't mind so much. What bothers me more is that otherwise interesting discussion gets split over multiple pages, and the same points get made in each discussion, with some non-overlap.

I sometimes revert to my native "engineer" mode and try to do something to fix this. Usually I put cross-references into one or the other so point people to where the discussion is, or might be. Some people don't like this and down-vote them. Others do like this and up-vote them. most people don't seem to care.

I really don't mind the constant dribble of down-votes that I get for trying to prevent the splitting of discussions, but I do care that I'm not seen to be harming the "community".

Hence this poll.

What, if anything, should be done about the incessant repetition of stories?

PS: If you care enough to vote, please upvote the item so people get a chance to see it. If you think I'm karma-whoring and you want to punish me for that, find some of my comments and downvote them as scapegoats.

Provide a "merge" mechanism
1585 points
Do nothing - leave it alone
639 points
Other technical solution (please outline in a comment)
64 points
Force submitters to do a search for the story first
62 points
Continue to cross-reference them by hand
53 points
Go away. Just, go away.
52 points


Usually it's not a problem, because people only vote up the first version of a story. And when a dup does get upvoted, mods can just kill it. The Airbnb situation is unique because

(a) an angry mob is upvoting any story to do with this and

(b) we have to err on the side of not killing stories critical of YC or companies we've funded, or we get accused of censorship.

Fortunately the combination only occurs occasionally, so it's probably not something that needs a structural fix.


I don't see the respondents as an angry mob. Are you lumping edw519 in with those people?

You aren't applying Occam's razor. The simple explanation is that this is the most relevant HN story in recent memory. It involves a startup, it involves a YC company, it's one of the most valuable startups in the site owner's portfolio, it's an international print news story, it involves tragedy, and the outcome will make us all think about what it means to be a successful start up.

That's why the stories all have traction. It's a fantasy to think it's an irrational mob.

Edit: I should say I see neither the respondents nor the upvoters as an angry mob.


If you take a look at the front page of reddit you may change your mind about what Occam's Razor implies in this case. There is nothing more common than to find people reflexively upvoting stories about something they're agitated about.

You can see that's happening here from e.g.

http://news.ycombinator.com/item?id=2821399

which has no less than 275 points (so far), despite being a completely derivative article that adds no new information. People aren't upvoting it because of something they learned in it that engaged their intellectual curiosity.


That story originally carried a title like "AirBnB scandal makes front page of the Financial Times". I thought it was interesting from the perspective of "How does bad press about a startup spread and what are the effects of various ways of responding?" rather than "Let's keep piling on AirBnB." This is the first big YC crisis management story I can recall; it's not surprising other founders would want to learn from it.


Although it's correct that the article itself is completely derivative, the simple fact that the story is in the FT provides new information. The fact that this narrative can go mainstream is itself fascinating and new, and worth discussing.


Would that they were discussing the fact that it's in the mainstream media, but it seems that they're simply hashing over the same ground in the other items.

Seems to me that it adds nothing.


On the contrary, searching for "mainstream", "ft", "financial times", or "media" in the comments for the ft article suggests that people actually are discussing the fact that it is now in the mainstream media. It might not be the majority of the comments, but at least those subjects are broached in there, and much more so than in the other threads.

But regardless of whether people are discussing something new, my point is that the story reaching the broader media is an interesting information point in of itself. Whether HNers choose to discuss that in an intelligent manner is an entirely different thing.

That being said, like you, I am also bugged by the repetitive stories -- both in this instance and more generally. And although I think that sometimes multiple submissions are warranted and useful (when the story changes, or there are new sources, etc.), I think that often new threads add very little to the discussion.

I still love it here though. But I only lurk, so...


The original title for that submission was something to the effect of "AirBnB story now on the front page of the Financial times". It has been edited to be both more in lines with the traditional HN submission guidelines (i.e. the topic of the submission, no editorializing) and less representative of the actual point of interest.


This is a tragic story that many of us can relate to on both sides (as a homeowner and as a technologist/entrepreneur), for which the most current publicly-available information is completely contradictory. And unlike you we don't have the people who can give us the full story a phone call away. Can you really blame us for wanting to hear about all developments to this story until there is some kind of resolution?


Maybe its a design problem. Just now the first thing you see is the upvote link. People could be upvoting on the basis of the title (i dont, i always read the story then forget to go back and upvote if i like it). What if you put the upvote link for the story in the bottom right corner of the comments page. Then it's presented as something you act on after you've read the story/comments, which is how it should be acted on.


Indeed - and frequently when I point out that a submission is of a derivative article that adds no new information I get down-modded. As of today, I'm giving up, especially in the light of this poll.

I've learned something new and disquieting about the current HN - and I'm fairly sure this has changed. I seem to remember seeing an earlier poll on this, and I'll go look for it later when I'm more awake.


By what definition is AirBNB a startup?

Edit: downvote me all you want, but a if company with a $1,000,000,000 valuation is a "startup" the word has lost any meaning.


That seems to be true with the "blitz" of stories like the Airbnb ones, but at a more general level, a lot of duplicates seem to be occurring because there are lots of different subgroups now using HN at different times of the day.

For example, these are the same story but did well with similar karma and similar discussions, a week apart from each other:

http://news.ycombinator.com/item?id=2778219 http://news.ycombinator.com/item?id=2801263

I certainly visit HN far too much but I notice "repeat" stories like these frequently.

MetaFilter (which is certainly pretty resistant to change) ended up implementing an automatic search feature at the posting stage to highlight potential conflicts like these. It seems to work well (it's not enforced - it just makes users think first). Edit: Realized I should add I don't necessarily think this is a good idea for HN. Personally I'd prefer to see the per-page # of items go up a bit. A 50 item front page would better cope with the higher levels of FP items.


Genuine question: are these repeats a bad thing?

When the articles appear simultaneously it's obviously a pain to have discussion fragmented, but if they're appearing distinctly, is it such a bad thing if people are discovering something new?


I am fine with repeats, but I think a mechanism that detects "related" articles and providing links to previous/later discussions would be useful. I certainly would like to hear more of what people say if a topic or a specific article interests me.

I am not necessarily advocating that this should be a built-in feature; on the contrary, I think it would be more elegant as external feature and I think in that form it would be more useful, since this sounds like one of those problems where the general solution is easier than a particular solution


I think that's quite subjective. It may not be a bad thing! I just think it indicates fragmentation of the readership in a way that didn't used to occur. Perhaps this is a natural part of a community growing beyond a certain size.

That said, there was once a time when any new front page link was something I was sure not to have seen before. Now? Unless it's up to date news, I've seen perhaps 1 in 5 links on Twitter, Reddit, or even my own newsletters already. People are getting 3 week old JRuby release notes on the front page nowadays fer chrissakes..


I appreciate your candor on this, but I'm curious if you are implying two things: 1) you think that people interested in reading about Airbnb are behaving irrationally, and 2) people rarely get angry at YC companies?

I find the interaction between HN and YCombinator rather fascinating, because you obviously have a financial interest in many things posted on the site, but you also clearly feel some responsibility to play the neutral host. I wonder this responsibility comes from HN's value to YC as a barometer of trending ideas (requiring you to retain the faith of your userbase), or from some higher sense of "journalistic" obligations.


Not to put words in Paul's mouth, but I believe his statement was:

1. It's relatively rare to get an angry mob

2. It's even more rare for that angry mob to target YC companies because YC companies are a small % of total companies

3. Therefore this situation is not common enough to optimize for


None of your points rule out my first implication, and I don't believe the logic for your second point was part of Paul's statement. Despite YC companies being a small part of the world economy (or whatever), they are disproportionately mentioned on HN for obvious reasons.


> 1) you think that people interested in reading about Airbnb are behaving irrationally

Every angry mob is irrational, even if there is a perfectly rational reason for their anger.

While concern about what has happened, is happening, should be happening (in various peoples opinions), and so forth is definitely rational as is an interest in watching it all play out (even if you don't care for airbnb at all specifically, this could possibly have implications elsewhere too).

People posting the same thing that others have already posted (or something so similar it makes no difference) and other people voting everything up whether it is a duplicate or not is irrational. It bends the system here potentially making the site less useful/interesting to others, it isn't going to change the situation, and the confusion of partial discussions spread over threads is going to put people off that might have otherwise cared about the situation (they'll stop reading any post about it, no matter how high up pages 1 it is, as they'll start to assume it is the same as last time).

lt;dr: the interest/concern is not irrational but some of the behaviour seeming steming from that is, so suggesting some people are behaving irrationally is not incorrect (IMO).


Then don't be bothered by the scare word and just do what's best for the site. People (especially internet commenters) don't really understand what "censorship" means and only really use it to whinge about not getting their way.

reddit's /r/politics mods recently decided to start moderating away the "does anything else think this thing that we all think? vote me up!" and all of the comments on that announcement were accusations of censorship. Nobody is trying to censor anyone, they just want to improve the content, but the people making the self-posts (or in the case of HN the anger porn) want their attention and think it's censorship when they aren't given it.


Considering the incentives that YCombinator has for hosting this site in the first place, you aren't going to be able to remove the need to be perceived as being reasonable in not killing bad news about YCombinator companies. This is just a cost the site will have to deal with; all the alternatives are worse for either the community or YCombinator.

I agree with the broad thrust of your post, actually, people do scream censorship at the drop of a hat when inappropriate, but however unfortunate the reality may be, it's best to let this exception through.


Taking away opportunities to whinge is what's best for the site. I'd rather see the duplicates and not see the "censorship" posts.


Also, I think there are two kinds of links, "original story" links, and "what are people saying" links. For example if you are for some reason genuinely interested in AirBnB as a business, both the original story AND what tech crunch are saying about it is potentially of interest, if you believe techcrunch is an influential source that can affect AirBnB's prospects.

As to the OP, I wish for what some of the open source bugzillas do - when you submit a bug on, say, cups printing with foo driver, the system does a search on those keywords and presents you with bugs that may be duplicate. I don't think that "forces" the user to do research, but assuming a well meaning user, which most of us are, it allows them to realize there is an ongoing discussion and join that.

The problem I haven't figured out is how you would implement this with the bookmarklet - the main site submission would be straighforward.


Clearly giving the whiners whatever they ask to make them shut up isn't good for the community. You teach them that whining gives them what they want, which will just make the whining worse because it becomes a successful tool.


There's an important distinction between "we don't negotiate with terrorists" and "we will never implement a worthwhile feature if it was requested in a disrespectful manner".

Take the Zed Shaw + dongml story [1]. Zed's behavior was silly, but the loophole needed to be closed regardless and so it was.

[1] http://news.ycombinator.com/item?id=2601342


On the one hand (a) is probably right. On the other hand it is a story of technical interest to those of use who are not part of the "angry mob". For me I have voted up some of the stories because they contrast interestingly - the way, for example, the TC stories have contrasted to EJ's accounts are interesting.

And as an example of how "damage limitation" can bite you (fairly or otherwise) there are a lot of lessons here.

End of the day; this is a big story in this community and there are some key lessons here. I'd definitely vote up someone who could blog about it critically from a "things you could learn for your business" position.


It also happens with every Apple or Google press release though.


Yeah, I don't see how Paul can say "usually it's not a problem" when it happens extremely frequently with every major press release.


> Usually it's not a problem, because people only vote up the first version of a story.

I'm not sure if this is actually false (i.e. people vote for multiple versions of the same story) or just practically false (i.e. people vote for the first version that they notice, and so multiple different ones bounce around the front page). But either way, it's pretty usual that there are dups of at least one story on the front page at any given time. Or, that today's front page has a dup of yesterday's front page story (rather than continuing discussion on the same submission).


How do you account for the same story about IE users having a low IQ having been submitted at least 5 times so far? Yes, most are sinking without trace, but it's still really annoying to see the same thing over and over again.


I think articles that are not the original source should be merged as comments under the original source since they are often just another news organization or blogger commenting on the source without additional source data.

This will also bring Hacker News more in line with paragraph 5 of the Hacker News Guidelines[1].

I believe even if an article "adds to the conversation" it still isn't worthy of a top level submission. It should be a comment like all the others.

EDIT: To implement this maybe power users above a certain karma threshold can be granted the ability to mark an article as derivative and point to the original source submission. If some magic number of power users do this, the article and the comments are moved and any links to the derivative article are redirected as well.

This will also discourage karma whores and blog-spammers from submitting derivative articles because there is a good chance it will get rolled into someone else's submission that actually submitted the source article.

[1] http://ycombinator.com/newsguidelines.html


I agree. I think merging threads would be the best option.

I don't see every story that goes through HN, but I have often seen people post that a similar article was already posted with a link to that article. A merge mechanism (for those over a certain karma level) would allow people to recommend that two separate posts are merged into a single posting, with the karma from both added together and the responses sorted as they are now by karma.


Let's say that someone reposts an old Spolsky article that has been around for 3 years and was posted back in the day when it was fresh. Instead of detecting the duplicate and not allowing the repost, I'd like it if HN automatically posted a link to the old discussion as a comment on the new post. Maybe do some sort of filtering on dates, so that the system disallows reposts of recent items as it currently does, but doesn't block posting a 3 year old article again for more discussion. Not everyone was here 3 years ago, and asking someone to search HN for every old article they might be interested in is a pretty crappy interface for discussion.

Edit: Removed dangling "There's value in revisiting discussion" statement.


I agree with that entirely, and have no problem with "classics" being resubmitted, with a pointer back to the previous discussion. I'm more irked by the 5, 10 or 20 times some stories get submitted, like the "G-Man" commercial, and others.



But how many of these where on the front page for any considerable amount of time?


I have no (easy) way to check that. Two of them got 13 points each, all got at least 4 or 5. Some discussion started, but they didn't get very far.

But that's a specific example from today. Statistically, showing that on a day chosen at random there is at least one example shows that it's very likely to happen most days. Certainly it happens often enough that discussions do get split, and it annoys my sense of rightness, my sense of good design.

As an engineer, it offends my aesthetics.

It significantly hampers my enjoyment of HN, but based on this poll it seems I'm clearly in a small minority. I guess I'll just have to live with it.

One of my foibles, I suppose.


Option not listed: I just flag anything that appears on the front page if another, better post about the same thing is also on the front page. If lots of people did this, then we would quickly wind up with exactly one post about a topic on the front page, so people's discussion would mostly go onto that post. Problem easily solved.


Reducing every angle of a story to a single thread isn't going to help matters.

To take this week's AirBnB PR uproar as an example, many of the front page posts on the issue represent different facets of the story. Sure, only one link to the original blog complaint from the renter is necessary. Subsequent posts highlighting the story's path from HN to TechCrunch to a TechCrunch-hosted official response to the Financial Times and other news outlets are providing useful context here. The story has moved beyond the facts of the initial incident and onwards to its impact on startups, fundraising, disrupting regulated industries, and more.

There's a lot to talk about here and it doesn't all belong on one thread.


I think that's different than what the OP is talking about though. If different stories are providing updates to an earlier story or original information of some other kind that's fine. If there are multiple links from different sources that are duplicating the same basic information then it creates unnecessary redundancy and splinters the discussion.


  + If lots of people did this, then we would quickly wind up with exactly
  | one post about a topic on the front page, so people's discussion would
  + mostly go onto that post.
If lots of people did this, then we would quickly wind up with no posts about a topic on the front page, because we often disagree about what constitutes "better."


I flag repeats based on who submitted first. First come, first serve. Every next one I see I flag.


There is too much discussion that spreads across stories though. The idea of merging sounds, uh, dicey, but it would avoid having fledgling discussions from being lost or preemptively killed.


I think the only reason there's a bunch of discussion that spreads across stories is because there are frequently duplicates that sit on the front page for ages. If they were quickly flagged off the front page, I doubt they would attract a significant portion of comments.


I think a "related to", "reply to" or "topic threading" mechanism would be neat, but possibly difficult to work out.

People generally want to submit their own threads for various reasons (adsense, unique-impressions, karma, etc...).

The tagging would probably still have to be community driven, but an "is child of topic", or "related to topic" tag would allow users to see a different sort of front page. (GROUP BY TOPIC, ORDER BY DATE DESC)

You'd only see the most recent post on a topic until you clicked on a "show related" follow up link.

Google news seems to do this grouping reasonably well. One headline and see all 872 articles on this topic. Obviously PG isn't going to write that but a poor man's version could be implemented with a 3 column relationships table.

It would also leave the current new and frontpage untouched for those who loathe & fear change (or unintended consequences).

I'm sure there's some other actual considerations to worry about in this - that's only an off-the-cuff response. But generally I think a "related to" list and a front page filter showing only the most recent article in a "thread", with the ability to expand out into the others under "more reading"

Maybe only the highest voted in a thread?


We managed to post nearly identical ideas simultaneously - a Google-News-like approach to related posts. In all my experimentation with community news sites, I find I prefer Google News if all I want to do is read the breaking news (without discussion) entirely due its well done grouping of related stories.


Yeah - wouldn't it be interesting if the Google news style grouping of all articles on a topic also had a single threaded discussion about the topic as well? (complete with upvotes or +1's of exceptional comments and analysis from the user base).

I think one of the issues we'd have to contend with is the tremendous drive that the publishers have to generate new column inches. They really want to squeeze out just one more pageview (or competing web pages who each want their first pageview).

There will be a lot of incentive to figure out how to not be tagged in the "new grouping" sort of system.

If you head on over to reddit, they've got a great visual description of the "duplicate news" issue http://i.imgur.com/WANc1.jpg I think it's kindof relevant to this discussion.



One of the downsides of simply merging items, with different URLs, is that the different articles might both be worth posting because they cover different aspects of a story. So it'd be great to see multiple URLs on one item.

Also, there'd have to be a pretty clear standard about what constitutes one item. Sure, same URL - same item. Probably different URLs for the same story on the same day should be one item. But in the AirBNB ransacking case, I'd argue there should have been multiple articles: one for the initial "This is what happened" blog post and one for the "Suspect in custody, AirBNB has made changes to their organization" followup. Edited to add: And another for todays posting, noting that AirBNB probably hasn't done enough for the victim.

I assume some people blindly post TechCrunch URLs to try to boost karma - so a karma based solution might work. You post a duplicate URL, it changes your karma by -1 or 0. Not enough of a karma hit to really change the overall karma, but perhaps enough to search for the article first.


I think the problem is that you can not change your vote.

When duplicate stories appear in the front page then you will probably upvote the first one. Problem is that this may not be the one with the highest score, or with the biggest discussion. When you then realize that there is a better submission of the same story, you can't change your initial vote.

I think that if you could change your vote, the problem would be corrected by the community itself.


Weak supporting evidence: reddit does not appear to have the same problem. (At least not on my front page; YMMV.) Of course, there could be any number of reasons for this.


I'd vote for a style of hand-merge mechanism that would

- Take the title of the post with the most discussion

- Turn it into a link like a poll, where each link has it's own "interior" link

- Throw all of the conversations from the posts and throw it into the main body of the combined post.

Users keep the ability to gain points, the front page stays clean, and no data is lost. The only downside is there's no easy way to flag things as "merges", unless the ability to "flag to merge" is added for people at a certain karma level. Enough flags, it automatically rolls them together.


I voted for provide a merge mechanism, but I think there are real issues with that approach that would need to be solved. Merging makes a lot of sense for multiple stories on the front page at the same time, but depending on which story you read you might bring back a significantly different take, there might be conflicting facts etc. Then you'd have people arguing details that are all ostensibly correct based on the article they read. Merging stories that had significant time gaps could be even worse as new information may be understood by now (if it's an evolving story) and the comments will have a high volume of high rated comments that will discourage new discussion as it'll be unlikely to be noticed.


Do it like stackoverflow. Search submitted urls/articles and show in the article submission interface. Preferably ajax-y so as user types url and/or title it already sees if he is posting something repeated.


I was going to suggest the same thing but I'll add that it might be prudent to start cross-referencing posts based on URLs in posts and comments.


I vote for "do nothing." I don't see this being a big problem, and to the extent that it is a problem, I don't think it merits risking the unintended consequences of a possible "fix."


I vote for do nothing. If the story is exactly the same just from a different source, it should be downvoted. Some duplicate stories have a slightly different slant and thus encourage a discussion of the subject from a different perspective. The most obvious example is the AirBnB story from the financial times. There is a lot of overlap in the comments from this submission and previous ones, but there are also different perspectives. I think for stories that are this big its better just to leave it alone instead of attempting to cull the duplicate submissions.


    If the story is exactly the same just from
    a different source, it should be downvoted.
You can't downvote submissions.

    Some duplicate stories have a slightly
    different slant ...
In my experience, most don't.

    I think ... it's better just to leave it alone
    instead of attempting to cull the duplicate
    submissions.
Noted.


I had forgotten that you can't downvote submissions (after 2.5 years I still haven't crossed the downvote enabled threshold). That does change my opinion somewhat. I do remember an alarmingly large number of duplicate G+ stories during its launch.

Given that the main way to manage duplicates (by downvotes) is disabled, I could support some other mechanism. I like the idea mentioned in the comments about grouping similar submissions and have the comments merged.


I'm not convinced that multiple submissions is a bad thing. We're all going to continue to get out information from different sources, and the discussion that follow a particular submission may have as much to do with the source as the story. Some people might place more trust in one source over another or one poster/submitter over another, and I think that is why the multiple submissions problem hasn't already worked itself out on its own. Any sort of automated merge/filter would be a form of editorializing that could squelch valuable discourse.


Idea for the merge mechanism. If a user thinks a link has been submitted before; the user posts a cross reference comment whose syntax must be simple as in: MERGE: news.ycombinator.com/link/to/article

If the merge comment gets enough upvote, then a detection algorithm automatically merges it to the other one.

I left out a couple of details, but this is the simplest way I can think of to automate the merging without reverting to clustering algorithms.

I guess this is equivalent to voting the story down the front page, but it allows for salvaging the discussion in the down voted link.


I'd like to see some sort of tagging mechanism, perhaps, so that I could ignore this all-too-common front page:

* Why you should quit college and launch a startup

* Why college is super important

* How I did great without attending college

* Ask HN: Please tell me I'm smart for dropping out of my freshman year

If I could say, "I'm not interested in seeing any more college articles today", I could avoid these waves of "X Considered Harmful", "Considering X Harmful Considered Harmful", etc. posts, which seem to go away after a day or two.


I'm in favor of pg remaining the benevolent dictator of HN, but perhaps -- just perhaps -- if a trusted person or cadre here were to implement their own "duplication identifying/deprecating" service that passed muster, he might be willing to consider interfacing with it.

For example, if someone developed a third party tagging mechanism, there might be one or more classes of tags that identify duplicates.

The tagging could remain a third party browser-based overlay, or it might at some point be given access to write appropriate comments re its discoveries or even, gasp, moderate.

With this approach, third parties do what they're going to do, and if it's liked enough -- no guarantees, and no pestering! -- maybe it gets "blessed". Kind of like how search eventually ended up getting a textbox in the footer.

Anyway, I'd favor a design where I can quickly tag; if there were someway to conveniently incorporate the ID of the better post/resource, all the better. Then either let other users see/reference the tags directly, or (and/or) feed them into some form of meta-analysis and/or moderation.

We might also take a pass at improving the existing duplication detection.

I haven't really thought the above through, particularly how it might be gamed or otherwise end up being a negative.

P.S. Ideally, but perhaps not practically, I'd also like duplication detection/flagging to catch the all too numerous instances where people grab references from comments and post them as new posts. It's fairly prevalent and a readily apparent karma whoring mechanism for some. (On the other hand, sometimes such an elevation is warranted; it's not an entirely black and white issue.)


I'd like to see a tag mechanism where tags may be attached to a submission and voted on. Then, I'd like to be able to view the home page sorted by tags instead of by submissions.

Hopefully, that'd result in all Airbnb stories grouped together under tags representing different aspects of the story as well as automatically cross-referencing it with other examples of PR failure.


I think tags are a great idea, especially if one could block stories with certain tags (i.e. "I don't want to see anything tagged with Bitcoin or TechCrunch").

On the flip side, if someone really wants to read about Bitcoin, then can look at all stories with that tag.


Checkout forrus.org. I'll send you an invite when you do. You can register with your HN name if that helps.


Done, let's see what you've built.


Sure thing, I'm going to write you an e-mail to give you an idea of what's existing and what we're building in the next month or so as well as an invite.


I feel like this problem is bigger than just Hacker News - with the plethora of different news/discussion sources,I often find that 'breaking news' gets old and stale within an hour of the event. Or, sometimes more damaging, I read an interesting comment, then can't remember where I read it (so I can properly cite another's analysis).

Sadly, I don't know there is a site specific solution. While a merge would theoretically be great, I think we would start to see super threads with several hundred comments (and associated repetition). The search feature would be interesting, though it would discourage extremely busy people from submitting (since you could theoretically have to read three or four articles to determine whether yours deserves to be a separate item).

Perhaps some sort of curated site that grabs content (complete with discussion) from around the web? Different take on a newspaper's editorial section...


pg in relation to the Offer HN fad of a few months back:

Don't worry, these things always run their course.

http://news.ycombinator.com/item?id=1839740



Yes, but that's buried in a long discussion about ideas in general, and here I'm interested in a specific, long-standing problem (as I see it) that I'd like to see addressed.

Alternatively, I'll just stop trying to perform a service by cross-referencing things if people - on the whole - don't care.

I'm just tired of people telling me they value it, then seeing that I get down-voted each time. I'd like a sense of what people really think. A poll is the closest I can get to that.


Yes, but that's buried in a long discussion about ideas in general,

I generally agree with the premise of your original post (and I deal with it by providing links to earlier submissions by hand) but here you provide the rationale for submitting stories more than once.


I'm just tired of people telling me they value it, then seeing that I get down-voted each time.

I find them useful, but I only upvote them if I see that they are below the level of the worthless comments or if I can tell that they have a negative score.

If everyone acts like me, you're more likely to lose points on a cross-referencing comment than to gain them.

You've done a lot of complaining about the HN community. You seem not to trust them to do what is in the best interest of HN. Why let them determine what you post?


    You've done a lot of complaining about the HN community.
    You seem not to trust them to do what is in the best
    interest of HN.
Interesting observation. I hadn't thought about it from that point of view. I think you're right. I think I used to trust the community to do the right thing, and now I don't.

That's a useful comment - thank you for making that suddenly so clear.


If things are on the same topic, but provide separate, unique points of view, they should go into an inner list, like you'd see for multiple google results from the same location.


I would be a very happy man if I could filter out stories having Airbnb in the title.


I am not sure what you mean by "merge" mechanism, but it would be wonderful to have a feature like on StackOverflow, where's before you actually click submit, the app recommends that you review suggested links before posting your own.

Of course, this is going to help people who haven't see the same story repeat, but this isn't going to stop those who repost hot topics just for the sake of gaining karma points.


I sometimes post a cross-reference to say "here is where the still live discussion is". I have never been downvoted. I suspect comments of that sort get downvoted when they come across as somehow ugly -- like they are being critical of the new submission for having been made at all or, worse, are actively discouraging real conversation from breaking out because it didn't happen in some prior submission. I think it's rude when someone remarks that it's a duplicate in a negative sounding way and then links to something where the conversation is long dead. I recently saw this done in a piece where the previous submission was from 12 days earlier. I just thought that was asinine behavior. I replied to it, then thought better of it and removed my remark. (Glad to have this chance to remark on it here though.)

I don't know what a better solution might be. I am just saying I do know that how you comment on the existence of a duplicate and frame the cross-reference matters.


Filter on URL would be the first step, if it is a duplicate URL submitted then point the submitter to that story.

Could also use recently, i.e. last week or 24 hours, to compare stories at submitted URLs such that if someone submits a story on G-man advert from a different site it is found or at least present to the submitter to check if their submission is a duplicate.


URL duplication is already done, but it's especially simple-minded, can't cope with tracking crap on the URL, and will never work for the identical story from different sites.

If the URL is identical it counts as an upvote for the original submission.


I like the Google News approach that algorithmically groups together related stories as follows:

Show the most popular story (or 2 or 3), then provide a link that states "See all 143 related stories."

The hard part on a community news site is how to deal with discussion. I think a neat way to do it would provide a choice to see discussion by submission or merged discussion of all related submissions.

The benefits of doing it like this would be as follows:

1) Declutter the front page of related stories 2) Still make it easy to see all related stories with just one extra click 3) For stories with little discussion, makes it easy to see discussion on all related stories as well 4) For stories with lots of discussion, the traditional way of discussing around 1 story only is a great option, to keep discussion more focused (less overwhelming).

Downside: I suspect this would be difficult to implement and therefore buggy - especially how the algorithm decides to group things. The Google News algorithm does a pretty good job.


It would be interesting to see some sort of on-the-fly "topic" creation. For the AirBNB scenario, one AirBNB topic that would be a page that links to the 3 or 4 individual submissions currently ongoing (potentially with some comment previews, etc.). This would also cover the case of the same basic story (Foo releases Bar!) that gets picked up on multiple tech news sites and blogs.

For things that are re-posts of the same content it might be best to be hand-curated. Where If the re-post appears to be gaining traction, pull out the old submission and discussion, but put it back on the front page for X amount of time (merging discussions from the new submission into the comment thread of the old one). For scenarios where the re-post is getting no traction then just kill it.


This is a usability problem: On an average day X number of new stories appear — but now that's a flood of information that only a few people can keep up with. So two solutions that can be tried are:

- Have a design mechanism that allows users to link stories together: So you sort of crowd source the process

- Have categories for stories like Digg: So for example if everyone is suddenly subbing the same breaking biz news story you'll see the dupes right away. This would have the added benefit of allowing users to really see the topics they want: For example as a designer as much as love you programmers I'm not sure that an article about Haskell, and to be fair a programmer may feel the same way about typography. And in a sense you're already doing this by having categories like "Ask HN"


I voted for merge.

It would also be nice to have an "alternate links" section on top of the comments page, with possibily of upvoting the alternate submissions.

Even better if an alternate link could replace the main one if it is voted as more relevant (ex main blog post replacing a post merely quoting it)


I also voted for merge. Perhaps when a comment formatted in a particular way recommending a merge receives n number of upvotes that could trigger the merge?

It's difficult because so many blogs and other news outlets in the tech media world travel in flocks. After one does some original reporting the rest jump on the story and (usually) cite the original source. However, the original source for the story doesn't always rule on HN in terms of hosting the main conversation. Sometimes this is because of the timing of the submission and sometimes one of the members of the 'flock' are higher profile and gain more knee-jerk upvotes (TechCrunch comes to mind).


Another possibility: when submitting a link, simply have an automated check against previously submitted links, and give an automatic upvotes or two to the previously linked submission. In this way, if a bunch of people resubmit something (say, an old article is making the rounds on the net yet again), it will have a chance to be brought back to the front page - with all the previous discussion - so that HN can stay a place where people can discuss the hot topics of the day, even if the "hot topics" happen to be four years old.

This aims to solve the "double discussion" problem, but it won't fix the "I don't want to see it again" problem.


So there are two points here: different articles on one topic may contain different perspectives, so they all should be preserved when merging; on the other hand, it would be better if there was a single place where people would discuss a topic.

Maybe a good solution would be to let people vote for merging several articles on the same topic, and once there's enough votes, merge them in one, say by concatenating both links and discussion trees?

If that's too heavy, maybe just extend each article with links to other discussion pages (say, adding beneath the text "This topic is also discussed here (link)").


    maybe just extend each article with links to other
    discussion pages (say, adding beneath the text
    "This topic is also discussed here (link)")
That's what happens with the "by hand" cross-referencing I already do, and which gets so many down-votes.


Merging discussions is IMO a better way, but it's also trickier.

(I, for one, mostly upvote your cross-links - thanks for keeping the graph connected!)


Other tech solution: you could cluster posts by content. E.g extract keywords/key phrases and group submissions that have some threshold of over-lapping terms. Of course not trivial, but an interesting project.


It can be a tricky problem because you have a 1 to 1 correspondence of hacker news posts to web pages, so you can't combine articles from multiple sources into a single discussion.

Its a design choice that has pros and cons. One related problem that I find more relevant than multiple posts from multiple sources, is the life time of posts on hacker news. Once something fades from the front page, it falls into irrelevance aside from reference for google search. Which is again a design choice, and I'm not sure the future goals of the site are to support long term discussion.


If this is really such big problem, wouldn't it be easy enough to simply check the link that someone is attempting to submit against all previously submitted links?

This wouldn't necessarily stop all repeat stories, but it seems much more feasible than hoping everyone does a search, and would at least stop people from uploading identical links.

You might also consider a time bar of some sort? Say, if the same link was submitted more than xxx days ago, it can be resubmitted, since there is a good chance there is something new to add to its discussion.


We should be able to link stories together. The most popular/recent story should then be shown instead of all of the others... (But you should still be able to visit the others if you wish to.)


If you mean the AirBnb news that's all over HN last day, leave it as is, it shows kind of the importance of a topic. It's big news, there should be many views/topics about it, it's logic.


Turn HN into twitter (sans 140 char limit). Posts are "unstructured", bit do retain @reply information. User @a and @b posts a link about Airbnb, user @c wants to reply, says "@a @b what #Airbnb really needs to do is bla bla bla.."

Then, the front page becomes a list of "trending topics". Topics are determined by some kind of statistical analysis of the overall conversation.

The /new page is a list of new posts that are not @replies, or something similar.

Order/score can similarly be determined, such as by reply counts.

More computationally intensive, but more "true".


Would be nice to have a system like stackoverflow... where you type the topic and it gives you related things already submitted, so you can see that you are late and don't post it.

Just a random thought !


As a partial solution, how about adding a polite note on the "submit" page, asking people nicely to check the past few days' worth of front pages before submitting, to see whether there has already been a discussion on the same subject.

I've just been to the "new" page, where I flagged a few versions of the "Internet Explorer Users Are Stupid" story from yesterday.

And it wasn't a very good story the first time (though it's exactly the sort of thing which tends to get a lot of upvotes, for entirely the wrong reasons).


I agree that dilution of new page isn't a big problem and the crowd aspect should ensure interesting links go up.

The problem to solve is split discussions so I think it would be great if duplicate links would be automatically merged and the discussion page unified.

The more challenging problem to solve is how discussions about stories that are almost identical or link to identical stories could be merged. Could there be an option to view discussions independently or show a page that auto-merges similar discussions?


Most stories are not just a single news item that is relevant immediately after some event and then less relevant over time.

Many stories evolve and new information is added. The Airbnb stories lately are a perfect example.

If you're busy enough to occasionally miss a day (or even a half day) of HN, you might not realize that something has already been posted.

I really think this is a non issue. Occasionally when there are dupes, I ignore them if I found them boring the last time, or read/comment again if they were fun last time.


What about listing the story with the up votes and merging all the other stories about the same thing under the first one. I think it would be good like Google did it with discussion. If googles finds a page which seems to be a discussion, you can click the plus icon and all the other related results are getting listed.

What do you think about that? I think it would prevent that other topics are getting to the second page.


One idea: instead of making the formula for article placement based on the points and decay based on time of creation, don't use time of creation in the formula at all, and instead make just the points themselves decay. This way old things can come to the top again if they start receiving upvotes again.

This idea alone wouldn't solve the problem, but if combined with some sort of "merge" mechanism then it would be sweet.


How about a single title but the ability to have multiple links to different sources? Almost like tags but only it's links.

As others have mentioned the same story is often available from multiple sources so merge all sources under one super title.

It's also good since you can update without having to yet again submit a link but for the update and for people who haven't seen either the orignal(s) or the updates it's now all in one place.


There should just be threads with multiple links. The "original source" is what happens when you click it. The secondary source is where things like TechCrunch's opi ion pieces go.

Instead of having some classifier do the grouping, have the high-ranking community members handle this merging. The discussions in these big threads would be about the topic in general, not just TechCrunch's version or Venturebeats version


The solution should be to create more awesome startups, which will get more press coverage, thus having more unique stories on hn.


HN submissions are URL based, but display of submissions are based on a combination of factors including up-votes, flags and time since submitted. As the name "Hacker News" (or "Startup News") implies, the goal is collecting "news" articles rather than "classics" or even "topics" and of course, this design splits discussions amongst related submissions.

For the moment, let's just assume a "merge" feature exists. How would a merged thread be handled for the sake of display? How would up-votes be handled? How would flags be handled? --We've got a tough problems right there, but we've skipped over the most blatant problems...

Who decides a merge?

Can merging be abused?

What granularity of merging is desired? (Do you want everything regarding recent problem of the AirBnB customer to be in a single submission, or do you want absolutely everything regarding AirBnB in total to be in a single submission?)

Who decides the granularity?

The existing duplicate checking is based entirely on the given URL of a submission, so it is easily abused, and knowingly flawed, but it's still far better than nothing. Identifying duplicates, and more relevant to this discussion, similarities, would require content analysis of the submitted URLs. It is feasible, but it is not easy. None the less, a solid content analysis algorithm would take care of the "who" and "abuse" problems, as well as allow some degree of configuration on the granularity.

Using your example ( http://news.ycombinator.com/item?id=2822080 ):

Assuming the seven submissions of the "G-Man" video are hosted at different URLs (duplicate videos on youtube), really serious content analysis would require downloading and analyzing all seven video files. Oh wait, the asshats at google (and every other video site) refuse to simply give you a link to the video file(s), so you have to do your own parsing, processing and often de-flash-ing of their pages to figure out the file download URL. And as soon as you have it working, they will change how they present video to break your code (see http://savevideo.me and similar browser addons for reference).

The other problem with content analysis on HN is the server would melt into a pile of slag. As HN exists, a single FreeBSD box (AFAIK), it would never be able to handle the load of content analysis.

Colin, your manual cross referencing is helpful, and I do appreciate it, but I believe it is a waste of your skill and time. Yes, I remember you have some code to somewhat automate it, but the reaction from the last time you ran it was not the most positive. We can't treat HN as our private play pen; for users it's a utility, but for PG/YC it is a way to source hackers to fund and a way to promote news stories about YC funded companies. --I don't mean it in a bad way, instead, it's just the well known facts/benefits about HN. HN serves different purposes for different people. I've always admired how PG and RTM consistently try to sell shovels in a gold rush (viaweb, ...), and HN is simply their newest type of shovel. ;)

Sadly, the phrase, "All press is good press," should now come to mind. In other words, consolidating submitted stories into merged submissions is actually disadvantageous to PG, YC, and the YC funded founders. You are asking them to get less valuable press. Everyone who understands how AirBnB works has been expecting a catastrophe like this to happen eventually. BUT AirBnB making it onto the front page of the Financial Times is extremely good for their business, even with "bad" press like this.

Though many find "endless repetition" of similar submissions annoying, the people in control of HN/YC and the founders of YC funded companies understand it is VERY advantageous for them. Given you are asking to "harm" the powers that be here on HN, you can be reasonably well assured that we're stuck with endless repetition. This is most likely the reason why you (currently) have 218 "Do Nothing" votes, and worse, 27 "Go away. Just Go Away" votes.

It might be a great idea for HN users, but HN is the wrong place of this particular great idea.


It need not be a "merge" as such, as long as we are granted the ability to mark something as a dup (and mark what it's a dup of). Everything else can be done algorithmically; I outlined some ideas for this in an Ask HN last year: http://news.ycombinator.com/item?id=1975950


I really don't mind the constant dribble of down-votes that I get for trying to prevent the splitting of discussions

I don't think you are "harming the community", helping if anything, but after seeing yesterdays posts I do suspect you are more hurt by downvotes than you say you are, and that you might benefit by not being so.


Prompt submitters with articles using some of the same terms and ask them to be sure its a new topic or a substantially different perspective on an old one.

Use a list of the five likeliest matches. Let sphinx or some other search engine do the work.

This won't work in every case, but even 50% would be a significant improvement.


Some intelligent comparison algorithm should be developed to tag them as possible duplicates of stories that have been published in the last N minutes/hours/days.

It shouldn't be very tough to come up with a beta version of the algorithm. They're asked in most technical interviews and everyone answers them :D


quick note - upon submitting the link, there could be a lookup function on the keywords of the title or linked story title, which shows similar submissions...replicating the Google instant search functionality.

yes, there would be obvious issues with keyword overlap and would not apply for customized titles.


Merge the article for the purposes of comments, but allow references to each unique article cited. Perhaps an indent under the main story with related stories and just one comment section.

Either that or leave it alone. Most of the airbnb stories are semi unique takes on the same stories.


I'd endorse a script that auto-submits "Richard Feynman and the Connection Machine" every 2 weeks. It'd have a deeper psychological impact than just putting "Please search before posting" on the submissions page, and it's an article every newbie should read anyway.


An official mechanism to link related/duplicate submissions would be great. They could be displayed at the top. Listing submission dates there would also be helpful.

I do not like the idea of a merge because often comments are VERY article-specific, NOT simply topic-specific!


There's a first come first serve mentality with duplicates, so why not transfer the points from it to the first one?

Other ideas: Reference the duplicates, allow people to submit the link (makes moderation easier).

Look to Whirlpool, they lead in Moderation :D


You mean like what happens in the regular "mainstream" media? You didn't notice the endless stories on Casey Anthony or the Debt Crisis? This is the normal pattern with ALL new sources. Why should HN be any different?


Well perhaps google search style online suggestions for existing posts will help.. i don't like forcing people to search, besides, people might just learn to hit the search and skip buttons from muscle memory.


Do the same thing that is done in stackoverflow.com. Except that needs to be combined with fetching the title/basic info from the article and trying to combine it with recent articles from past couple of days.


A recent situation that, um, puzzled me in the past few months was a story being posted here and getting no traction until someone else posted it 24-48 hours later. These things just happen, it seems.


I like the merge idea, but it would be good to preserve all links at the top. So more like grouping than merging. That way if several stories offer different perspectives they are all still viewable


I've always wanted a way to associate stories, even if there is just a list under the headline (when viewing comments) listing possibly related stories (based on user feedback).


Maybe, when you post something, it could say (Did you know about X or Y posts) a little bit like stackoverflow. So, in a way, it's an automatic search while posting.


Clearly: provide a merge mechanism, for example by allowing users to vote for a merge. If a certain threshold is met, the two submissions are merged.


1) change name of site to redd1t or d1gg. 2) Do something like Wikipedia has where people can vote in integrity/quality of source/story.


One thing I'd like to promote is refraining from reposting too many stories to other aggregator-type sites such as MetaFilter.


While I'd like a mechanical solution, I think the best results are probably going to come from human intervention/interaction.


You shouldn't do anything because sometimes you may get an update on the story and the discussion could be totally different.


Merging is the only solution that is feasible and doesn't lose content.

It just requires moderators or people over certain karma thresholds.


How about a tagging function. Users can vote if a tag applies or not. That way you get a self-generating ontology.


Provide a "merge" mechanism. Keep track of merges. When merges exceed a threshold, HN merges them for all.


keep distinct news items, but merge into a parent item that links both, much like the way this polling story work.

The parent item gets the combined vote total of child items, and any summary screen that shows the parent item would then remove all children of that parent item, showing the parent instead.


Even when a big story breaks, there's rarely more than 3 or 4 links that pop to the front page throughout the day. They're from different sources, sometimes they provide different insights. Other times they don't, and they just go away.

If 4/30 links on the frontpage are about the same story, it's really not that big of a deal.


I'd be happy with a hide button


Automatic search while submitting and display it on the side? Like Stackoverflow?


How about a done-vote button?


just like bugzilla.

if you dont know what i mean, just go to any bugzilla install and type something generic as new bug title

its going to ajax its way and show you matching bugs while you type.

it could be the same for stories


Have HN inform the user that the link has been submitted before.


Merge would be nice. Voting on a merge would be nice too...


I think we need to repeat this poll, over and over.


Topic clustering? A la Techmeme, Google News?


Wow, most on HN are not action oriented?


Allow downvoting of submissions!


I vote we convert the repetition to recursion, which should be more acceptable to the Y combinator crowd.


Digg handled this very well. When you submit a new story it looks up current submitted stories to see anything matches and recommends those to be used instead.

I loved that because it leaves the decision to the user and gave them option to still post a new one if it had material not covered in already submitted stories


Provide a Google -1 button


Avert your eyes.


Maybe that was too blunt.

I've watched these kinds of complaints and calls for correction/improvement/action for some time.

I come to HN late to the party, so I don't miss the old days as much as the first residents. In fact I'm probably one of the people ruining the party, for some of the longer residents.

I come to HN every day, because HN is the most interesting site on the web. I suspect that the earlier residents still come to HN daily or with some regularity, for similar reasons.

While I understand the dismay that earlier residents might have in reaction to the changes, these changes are all but inevitable as a more diverse crowd comes in. This is compounded by HN's relatively poor interface, especially in relation to the current complaint.

I don't mean to put words in pg's mouth. I doubt that there's much real incentive for pg to address all the mechanical issues in HN. He has many other interests, and he probably does well to attend to those before this.

Given the realistic improbability of large interface changes, the only thing left is to restrict membership to invite only. Otherwise you'll experience a slow retreat (as you are) from posts that longer residents like, as HN's low but strong buzz spreads.

Since mechanical and membership (I'm guessing) changes are unlikely, the most practical thing for any individual is to not worry about it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: