Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
100M Posts Analyzed: What You Need to Write the Best Headlines (buzzsumo.com)
218 points by vitabenes on April 1, 2021 | hide | past | favorite | 110 comments


> On Facebook, there is 100% difference between the top 20 headline phrases in 2017 vs 2019/20... We can attribute this stark change to a few things; algorithmic maturity, audience preference and the publisher landscape.

Or that means you're measuring the completely wrong things about headline authoring, because the data have no stationarity at all.

They allude to this, but it would appear that the only thing that matters is Facebook's editorial, laundered through an algorithm. So maybe a more valuable article would be hacking into Facebook and just finding out what it is they idiosyncratically value in a headline.


Or, maybe even better, they need to keep coming up with new nonsense to put in headlines. After a year or two people know "one weird trick" articles are in fact spam, so it becomes necessary to produce new phrases to put in headlines to entice / trick unwary readers.


Facebook hates him! Here's how a hacker news commenter used one weird old trick to produce new phrases to put in headlines!


GPT-3 was banished for life from Facebook for raping a sweet, 7-year-old girl! Find out how it happened in this outrage-clickbait article to sell you some quack wellness supplements via a chumbox. 35,827 sold so far! Click now because supplies are almost gone!


[Adds "stationarity" to vocabulary]


That's what you have to do if you want to have good vocabularity


Indeed. The least one I added from HN comments was "Tsundoku."


How to Talk About Books You Haven't Read (2009) by Pierre Bayard

Unfortunately, tsundokuists won't read it. Meta.


A tsunami of book hoardings piling-up that can play sudoku.


That is a marvelous word, thank you


i occasionarily create new words basing it on existing constructs.


How creativeneous!


Stationary + it.

PS: Earth - art = Eh.


I would guess the instructional surge is tied to advertising -- easier to convert clicks on ad instructions when people are conditioned to click on instructionals


It's not even a perfect analogy but I was instantly reminded of an episode of TNG where the Enterprise is booby trapped inside an ancient device that amplifies and redirects their momentum towards them as radiation (hurting the crew) and the ship cannot move. There is a way to quickly move the ship but it means handing control over to the computer and only has a 50% success rate. Failure means death. The crew refuses to hand over control to a computer* and, add a little Deus Ex Machina, and they determine that very slow thruster maneuvers could also free them. They do it old school, with skill and human (Picard's) intuition. Tada -

I honestly instantly re-experienced this episode when reading that title; there's nothing wrong with analyzing data, in fact, I probably would and the article isn't suggesting a machine write headlines, but it feels like writing for humans should feel human and not be machine optimized...

I realize I've just been rambling, probably connected with no one else on the forum but I've typed all this, so I'm submitting it.


I absolutely love that episode. Any TNG story that features Picard being exited about archaeology is usually a great one.

But I agree with your point. It’s not bad to write with computer consumption in mind, unless it loses the core focus which is that a human needs to read and connect with the content, and that probably take a human to do it (at least at the moment).


> Any TNG story that features Picard being excited about archaeology is usually a great one.

I've never been able to articulate how disappointed I was that the writers made Picard retire to the family vinyard. Nothing matches the thrill of preventing a galactic war but we got little pictures of other interests he might have and just sit around growing grapes. Literally going back where he came from after an incomparable career.

The holodeck episodes also hinted at other of his interests. I recall a sailing ship and something about being a detective?

They did something very similar with Kirk and Riker although in Riker's case they gave a plot device for him making that choice.


The interesting thing is when machines will write something that feels more human than anything humanity could write. I believe that will happen sooner than people think.


> more human than anything humanity could write

Made me think of:

https://www.youtube.com/watch?v=OyO18QrJ3zw (White Zombie - More Human Than Human, referencing PKD's "Do Androids Dream of Electric Sheep?")


Human written headlines are already being optimized in roughly the same way that machines would go about it, and using roughly the same data, too. Marketers have just been training ourselves to think like a statistical model.


Isnt that just AGI? I'm not gonna hold my breath.


>Isnt that just AGI? I'm not gonna hold my breath.

Or it could just be a fluke random AI achievement. It will still be humans judging it as "more human than anything humanity could write" or not.


We've made huge strides with GPT-3, which has arguably already surpassed at least some percentage of the humans with its writing capabilities.


Which is maybe not too hard, since quite some percentage of humans still cannot write at all.


TBH I don't think humans are creative enough in bulk to require AGI to sound human.

Individuals checking individuals: yeah, you can detect non-intelligence. But if I told you half of all buzzfeed headlines are machine-written... would you believe it?


>But if I told you half of all buzzfeed headlines are machine-written... would you believe it?

Honestly, yeah. Buzzfeed headlines aren't exactly the most creative or hmmm...deep?...meaningful?

There's only so many ways you can write

'10 reasons why x does y and that's bad'

Or

'Omfg x just happened, better y right now.'


I had forgotten about that scene.

I wonder if the authors of The Expanse were consciously or unconsciously riffing on that. Writing truly original stuff is hard.


Episode name is STNG booby trap.


To continue your tangent:

> The crew refuses to hand over control to a computer

Star Trek universe lives in this strange place of being on the very cusp of technological singularity, edging on the threshold and yet not crossing it. Both in TOS and TNG era, their computers are so advanced they keep accidentally creating sentient AIs[0]. And yet, whenever that happens, the AIs are met with apprehension; Starfleet, in particular, would like to see them gone. In some cases, the protagonists stand against the zeitgeist and fight for the rights of digital sentience, but in others, they're desperate to put the cat back in the bag.

In-universe, I explain it to myself like this: the society of Star Trek is afraid. They do not understand their technology at all. All the scientific and engineering knowledge people in Star Trek demonstrate so frequently[1], it's all in context of what computers tell them. They almost never look at raw data, they almost never deal with raw reality. They grudgingly accept this state, but do not want to make that last step and actually let computers run things.

Out of universe, it's obviously because Star Trek was meant to be a humanist story - the adventure of humanity, exploring the universe and becoming a mature, respectable, good people. Anything that threatens it, anything that would immediately turn humans into NPCs, dethrone them from the role of protagonists, is shunned in the series. That's why both artificial intelligence and genetic augmentation receive such negative treatment, a glaring exception in the otherwise inclusive and forward-thinking show.

Circling back a bit - I think our real-world society is approaching the level of "too smart technology", and we're going to become afraid too. And unfortunately, developments like this article aren't the kind that would happen in a Federation research facility - they read as something straight from Ferenginar. Our civilization is not proto-Federation, it's more of a weird blend of Ferengi Alliance and Cardassian Union.

--

[0] - See e.g. "Emergence" (TNG: 7x23) and "Elementary, Dear Data" (TNG: 2x03) for Enterprise D's computer forking off sentient AIs, "Evolution" (TNG 3x01) and "Quality of Life" (TNG 6x09) for Federation robots accidentally becoming sentient, "The Offspring" (TNG 3x16) for how easy it is to fork Data, "The Ultimate Computer" (TOS 2x24) for what happened when they let a too-smart AI run the ship for once - and that's not counting all the cases where sentient life was created with involvement of alien entities or objects.

[1] - One thing I love about this show is that competence is considered table stakes. Everyone, whether they're Starfleet, a Federation civilian or a member of non-Federation species, is educated, curious, good at what they do, and expects the same qualities of everyone else. It's a breath of fresh air compared to the real world.


Thank you for this reply! Made my day. While I enjoy the newer Star Trek (Discovery/Picard) they have season long arcs and when they stumble on themes like the ones you identified, they’re a side plot to the season’s arc. The morality in and out of universe is never really considered or elaborated on. It’s all about the season’s arc and intense theme music. It may be blasphemous and they definitely have a much more comedic vibe but it’s why I prefer The Orville to the newer Treks.


It's not blasphemy for me :). I'm having real troubles considering Discovery and Picard (and JJ movies) as canon - they're just too different from all the earlier installments up to ENT (including). Instead of optimistic and cerebral experience, we're now getting generic violent action movies with nonsense plots.

In my head, the only true successors to Star Trek as it was up to 2005 are 1) The Orville, and 2) Star Trek: Lower Decks. Apparently it turns out you have to market something as satire to get a shot at exploring more thoughtful topics.


We used to write poetry and aspire to higher states of consciousness.

Anyway, nice in-depth article. The results will surprise you. Especially point 6.


There is almost definitely more poetry and literature in general being written today than in the past, you just have to search for it.


There may be more in absolute numbers but relative to spam it practically doesn't exist and when you search half your search results are spam too.


Find authority sites on contemporary poetry.


> Anyway, nice in-depth article. The results will surprise you. Especially point 6.

I wonder how this line would fare in their headline analysis.


> The results will surprise you. Especially point 6.

I see what you did there.


Marketing exists purely in thought space. It's currency is the imagination.

Like most art, we commercialize it until our categories are diffusive.

But it's still there.


i generally scoff at poetry from a respective of blissful near-ignorance. Most dorms at least, but i recently saw a piece and thought to myself

oh.


One thing to remember is that the most appealing headlines to the audience may have not changed, rather the frequency of titles containing the top clicked metric may have changed over time.

eg. More headlines may be using the number 10 than 4, so 10 is more likely to be the most trending headline.

Similarly, in the lottery, the frequency of winners who picked their own number is dependent on the frequency of people picking their own number.


The thing to know here is the bayes factor. That’s the true positive rate divided by the false positive rate. In this context, it’s the percent of successful articles that have a property (like using the number 10) divided by the number of unsuccessful articles that have that property. This removes any advantage a property gets from being more common.


Right.

The result for headlines of 65 chars - shared 50,000 more times than 60 chars or 70 chars - seems too incredible to occur at random and suggests instead that a popular news source has implemented a 65 chars policy.

[Edited to note: Yep. YouTube is dominant as the popular publisher in this review, and truncates headlines at 66 chars - that's what this article observes]


The charts use median engagement, which helps normalize against frequency.

That said, a boxplot with 25th and 75th percentiles would likely indicate there is a heavy skew, as tends to be the case with social media data.


Disappointed this headline wasn't crafted to perfectly adhere to their own definition of an ideal headline (11 words and 65 characters):

  >>> headline = "100M Posts Analyzed: What You Need to Write the Best Headlines"
  >>> len(headline.split())
  11
  >>> len(headline)
  62
So close!


Well, they don’t really define a word. Is “100m” a word because the characters are contiguous? Or is it two words because we say “100 m” or “100 million” in our minds? Or is it zero words because it’s literally a number followed by a letter, which isn’t a word?


Speaking of which, does anyone know what's the story behind those comments on YouTube that are 3-4 sentences made up of a bunch of words that don't match up and look completely random? I see them on almost every new youtube video, almost same frequency as the 'vom' comments.


My guess would be something to do with the algorithm. Spam random words to trigger greater visibility in search results, boost 'engagement' metric - today's bump comment.


been wondering also. perhaps ai comments to create verified acc or the newest meme


people should start writing replies to them in the same logic they use


if they're bots, it would only help them to increase the engagement metric.


That's true, but it would be fun to look at the comments and always be one of those in the top with a bunch of human written replies. Almost a dadaist poem.


s/best/highest engagement/

These are not the same. The fact that they're so commonly conflated is a major problem.


I’d take it one step further and say that following these tips makes what I consider to be the worst headlines. I’m surprised they left out the obnoxious “one weird trick” phrase ;)


I'm curious, what other metrics would you use to judge them? Isn't the whole point of a headline to get you to read the article?


You seem to be implying that "only things which we can measure should be used for decision-making" and I would caution against that limitation for the reason that this is exactly how perverse incentives are realized. We are seeing it now when we conflate "good" with "gets engagement" or "makes money".

See:

https://en.wikipedia.org/wiki/Campbell%27s_law

https://en.wikipedia.org/wiki/Goodhart%27s_law

https://en.wikipedia.org/wiki/Perverse_incentive#Cobra_effec...


Nobody in this thread is proposing a solution. I would much rather discuss those.


The point is getting someone to read the article doesn't make a headline "good", it just means someone read the article.

You could have a "good" headline that catches the attention of a large number of people who don't really care, or a "bad" headline which catches the attention of a small number of people to whom the article is very relevant and really care.

Which do you want to optimize for?


Easy. I want a headline that accurately reflects the content of the article. That is my, and perhaps a few others, personal definition of a "good" headline. That is not being measured.

Given that definition, I may not go read the article because it doesn't interest me. It is still a good headline. I didn't waste my time. On the other hand, if I am interested in the content, I would have read the article and would not be irritated that I had been mislead about the content. The metric being used in the article here in no way leads to this definition of a "good" headline. More likely the opposite.


But you didn't answer my question. What other evaluation metrics are you proposing?

>The point is getting someone to read the article doesn't make a headline "good", it just means someone read the article.

It actually does. The headline did its job.

>Which do you want to optimize for?

That is a false choice. Why are these the only two options?


Persuading someone to read an article which is irrelevant to them is likely a bad thing.

Just because you don’t have a good metric for something doesn’t mean that what you can measure is better.

A metric can simply lead to bad results, and thefore be a bad metric.


>Persuading someone to read an article which is irrelevant to them is likely a bad thing.

You're conflating content targeting with headline writing. Those are two separate points.

>Just because you don’t have a good metric for something doesn’t mean that what you can measure is better.

Certainly, if a metric is 'bad' in that it is not producing results, nobody wants to waste their time and keep using it. However, the engagement metric is producing results for many folks. Do you disagree with that?

>A metric can simply lead to bad results, and thefore be a bad metric.

Anything "can" lead to anything. That doesn't really make for much of a discussion without data to examine.


> You're conflating content targeting with headline writing. Those are two separate points.

No.

>Just because you don’t have a good metric for something doesn’t mean that what you can measure is better. Certainly, if a metric is 'bad' in that it is not producing results, nobody wants to waste their time and keep using it. However, the engagement metric is producing results for many folks. Do you disagree with that?

This is meaningless to agree with or disagree with since the value of the results is what is in question.

>A metric can simply lead to bad results, and thefore be a bad metric.

> Anything "can" lead to anything. That doesn't really make for much of a discussion without data to examine.

So you agree that the metric could be bad.


>This is meaningless to agree with or disagree with since the value of the results is what is in question.

How are you judging the value of the results? I am not understanding your point here. Again, back to my original question, please propose alternate metrics, otherwise we're just arguing over minutia that misses the meat of the discussion.


> I am not understanding your point here.

I know.

> Again, back to my original question, please propose alternate metrics

That’s not actually necessary in order to understand what I’m saying. In fact it would be a distraction.


I much rather steer the conversation towards solutions rather than engage over abstract "good" and "bad" terms which you don't seem to want to define. In any event, we have reached a point of disagreement, which is fine with me, so lets leave it at that. Have a nice day.


> I much rather steer the conversation towards solutions rather than engage over abstract "good" and "bad" terms which you don't seem to want to define.

That’s not really how it looked earlier in the thread.

You seemed to be strongly defending the idea that engagement is good, and not even accepting that there could be a problem.

Perhaps that’s a misreading of your intention.


I will make it simple: they are asserting 'popular=good'. They are not asserting if the headline is misleading, whether it's an accurate summary, etc. Just popular.

Well, Hitler was popular too.


But nobody said this was how you write the best articles.


But they did say it was about how to write the best headline.


If I see an article about “the best fishing rod”, I’ll assume it’s in the context of catching fish, not being a fish.


Yeah, but these days it will probably be about a guy named Rod who is the best at fishing...but not really.


Right, but headlines don't really have a "quality" to them outside of attracting readers. So "best" is in fact "highest engagement", when it comes to headlines.


I disagree. In my opinion, an ideal headline should be a condensation of the content into a few catchy words. If engagement is all that matters, is "READ THIS ARTICLE OR YOU WILL DIE!!!" a good headline?


An irrelevant headline will not drive engagement, so no.


Depends on the context.


The NY Post is good at writing catchy headlines that attract attention. Yet the Wall St Journal exists. Why?


And I still won’t buy a subscription.

I’m waiting for the first paper to reinvent itself.

Meaning the digital, and the print, is so good it will be something I need to subscribe to.

Maybe those days are gone? I was just thinking would I pay for a HN subscription. A site I have viewed since day one. Right now—-no.


NY Post usually has solid content that matches the assumption taken by the headline. Can't say that for most of the major "news" domains.


Well, content is targeted by audience type, and the headlines reflect that?


Exactly. “One weird trick for maximizing return on equity” doesn’t really work for the WSJ.

“Man bites dog” is appreciated by Post readers.


"Don't tell me about the press. I know exactly who reads the papers. The Daily Mirror is read by people who think they run the country; The Guardian is read by people who think they ought to run the country; The Times is read by the people who actually do run the country; the Daily Mail is read by the wives of the people who run the country; the Financial Times is read by people who own the country; the Morning Star is read by people who think the country ought to be run by another country, and the Daily Telegraph is read by people who think it is."

"Prime Minister, what about the people who read The Sun?"

"Sun readers don't care who runs the country, as long as she's got big tits."


The point of a headline is to get you to view ads. The article is incidental. It keeps you on the page so more ads can be shown.


The news websites that entered the top running for headlines on facebook are all really bad. Just a lot of low quality, shady headlines. Not bored panda bad but pretty close.


Mainstream and social media are at near parity in terms of necessary informativeness. The reader has to seek-out reliable, independent media in order to stay informed.

I wish news and mass-sharing were banned from social media, because it's too often low-signal noise like celebrity gossip, contrived/falsified outrage, or some new movie or product. Garbagé.


On Facebook at least it's easy to hide all posts from news and mass-sharing pages as they show up in your feed. So you'll see them once and then never again.


I thought it was undisputed that the best headline ever written was "Headless Body in Topless Bar"


My bad, it was me. I needed some head.


I'll go first

ShowHN: Mono(te): An offline first, Turing-complete, blindingly fast notes app written in 23 lines of (Rust) code. Oh, and it respects your privacy, and it's Open Source.


Took me an unhealthy amount of time to search for this: https://news.ycombinator.com/item?id=25053553


--- hugodutka 4 months ago [–] --- The perfect Hacker News title has finally been crafted.

I was thinking of that post or similar. Thanks for finding. I struggled w/ the decision to use Rust or Elixir.


I don't see anything here which convinces me that the engagement figures versus headline word count is not simply more or less a mirror of the word count frequency histogram of those headlines themselves.

The null hypothesis is that people share more ten-word headlines simply because there are more ten-word headlines. If you want to show they like to share them more, you need something else, like data about how many headlines people looked at versus shared.


Shock finding! Average headline length matches average sentence length!


> 4. The ideal headline length is 11 words and 65 characters, according to the most shared headlines on both Facebook & Twitter.

~~That length is just shorter than recommended length for a commit message (72 characters if I recall correctly).~~

Compare that to 50 characters for a commit message’s subject, and 72 characters for each line in the body.


Typically 50 chars for the subject, 72 chars per line for the body.


Oh sorry, you are correct.


The best headlines _on Facebook_. Given that FB is the world's most notoriously bad/spammy content recommendation system this is like an article about “What you need to cook the best food” that focuses on the McDonalds menu.

But hey, I clicked.


First, you need to know that (at least in SI) "m" is for "mili" - a thousandth part of something, and "M" is for "mega", which means "a million units". So, 100m is 0.1 headlines.


I would call ##m meaning "## million" common parlance. You seemed to understand the difference


And how can't you empathize with a fellow HNer's OCD? (:


Have you actually been diagnosed with this condition too? Or are you being facetious?


That was meant more as a semi-joke (on me). Never thought about seeking a professional help because it's not affecting my life too much. I hope other people in my life would agree.


Yuo cna undrestnad thsi, rgiht?

One thing is if I could understand it and another thing is if the author's writing skills are good enough to advise others about writing.

BTW, the submitter (or a moderator) seems to agree with me (check the current HN-entry title).


They probably didn't intend for the headline to read "One hundred mega posts," so if anything you're arguing in favor of how they wrote it.


I don't know why and how authors missed this but they are not factoring in pandemic effect which obviously dictated these movement.


Make the most awesome headlines with this one weird trick.

Oh wait nm, there's some art and some science to it by evaluating reader interest fashions.


discussion the last time they did this in 2017

https://news.ycombinator.com/item?id=14643488


Now do that for HN posts.


aka click bait


Well "clickbait" is hardly a property of the headline – it's a condition of the content failing to deliver on the headline's promises and implications.


The headline is the “bait” that is seeking the “click”. One can hardly fault the article which likely existed before the headline (for the clickbait; it can still be a failure as an article in general).




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: