Consider the following model scenario. You are a PM at a discussion board startup in Elbonia. There are too many discussions at every single time, so you personalize the list for each user, showing only discussions she is more likely to interact with (it's a crude indication of user interest, but it's tough to measure it accurately).
One day, your brilliant data scientist trained a model that predicts which of the two Elbonian parties a user most likely support, as well as whether a comment/article discusses a political topic or not. Then a user researcher made a striking discovery: supporters of party A interact more strongly with posts about party B, and vice versa. A proposal is made to artificially reduce the prevalence of opposing party posts in someone's feed.
Would you support this proposal as a PM? Why or why not?
That's beside the point, though. The point here is that Facebook executives were told by their own employees that the algorithms they designed were recommending more and more partisan content and de-prioritizing less partisan content because it wasn't as engaging. They were also told that this was potentially causing social issues. In response, Kaplan/FB executives said that changing the algorithm would be too paternalistic (ignoring, apparently, that an algorithm that silently filters without user knowledge or consent is already fundamentally "paternalistic"). Given that Facebook's objective is to "bring the world closer together", choosing to support an algorithm that drives engagement that actually causes division seems a betrayal of its stated goals.
Same. I miss the days of the chronological feed. Facebook's algorithms seem to choose a handful of people and groups I'm connected to and constantly show me their content and nothing else. It's always illuminating when I look someone up after wondering what happened to them only to see that they've been keeping up with Facebook, but I just don't see any of their posts.
yesterday, in fact, I saw a post from a family member that I really wanted to read, I started but was interrupted. When I had a chance to focus again, I re-opened the FB app and the post was nowhere to be seen, scrolled up, scrolled down, it was gone. I had to search for my family member to find it again. Super frustrating, and makes you wonder what FB decided you didn't need to see (which I guess is the point of this whole thread)...
I agree with this. I have a mildly addictive personality and found I had to block my newsfeed to keep myself (mostly) off facebook. I follow a couple of groups which are useful to me and basically nothing else.
I deleted all of my old posts to reduce the amount of content FB has to lure my friends into looking at ads. But because of the covid-19 pandemic I was using facebook again to keep in contact with people. Now that restrictions are eased in my country I can see people again, and have deleted my facebook posts.
No. Why should the only desirable metric be user engagement?
Is the goal of FB engagement/virality/time-on-site/revenue above all else? What does society have to gain, long term, by ranking a news feed by items most likely to provoke the strongest reaction? How does Facebook's long-term health look, 10 years from now, if it hastens the polarization and anti-intellectualism of society?
> Is the goal of FB engagement/virality/time-on-site/revenue above all else?
Strictly speaking, Facebook is a public company that exists only to serve its shareholder's interests. The goal of Facebook (as a public company) is to increase stock price. That almost often, if not always, means prioritizing revenue over all else.
That's the dilemma.
Then again, I believe Mark has control of the board, right? (And therefore couldn't be ousted for prioritizing ethical business practices over revenue - I could be wrong about this)
> Strictly speaking, Facebook is a public company that exists only to serve its shareholder's interests.
That's a very US-centric interpretation, which fits because Facebook is a US company.
But it's still reductive to the issue considering how Facebook's reach is also far and wide outside the US.
In that context, it's not really that much of an unsolvable dilemma, it only appears as such when the notion of "shareholder gains above all else" is considered some kind of "holy grail thu shall never challenge".
This is a false choice. The real problem stems from the fact that the model rewards engagement at the cost of everything else.
Just tweaking one knob doesn't solve the problem. A real solution is required, that would likely change the core business model, and so no single PM would have the authority to actually fix it.
Fake news and polarization are two sides of the same coin.
I'd just suggest the data scientist was optimizing the wrong metrics. People might behave that way, but having frequent political arguments is a reason people stop using Facebook entirely. It's definitely one of the more common reason people unfollow friends.
Very high levels of engagement seems to be a negative indicator for social sites. You don't want your users staying up to 2AM having arguments on your platform.
This is why the liberal arts are important, because you need someone in the room with enough knowledge of the world's history to be able to look at this and suggest that maybe given the terrible history of pseudo-scientifically sorting people into political categories, you should not pursue this tactic simply in order to make a buck off of it.
Agreed. Engineers have an ethical duty to the public. When working on software systems that touch on so many facets of people's lives, a thorough education in history, philosophy, and culture is necessary to make ethical engineering decisions. Or, failing that, the willingness to defer to those who do have that breadth of knowledge and expertise.
"The term is probably a shortening of “software engineer,” but its use betrays a secret: “Engineer” is an aspirational title in software development. Traditional engineers are regulated, certified, and subject to apprenticeship and continuing education. Engineering claims an explicit responsibility to public safety and reliability, even if it doesn’t always deliver.
The title “engineer” is cheapened by the tech industry."
"Engineers bear a burden to the public, and their specific expertise as designers and builders of bridges or buildings—or software—emanates from that responsibility. Only after answering this calling does an engineer build anything, whether bridges or buildings or software."
You don't need liberal arts majors in the boardroom, you need a military general in charge at the FTC and FCC.
Can we dispense with the idea that someone employed by facebook regardless of their number of history degrees has any damn influence on the structural issue here, which is that Facebook is a private company whose purpose is to mindlessly make as much money for their owners as they can?
The solution here isn't grabbing Mark and sitting him down in counselling, it's to have the sovereign, which is the US government exercise its authority which it has forgotten how to use apparently and reign these companies in.
A lot of people wouldn’t know about the policy avenues that can be used to regulate these companies (of which FTC is not the only one), or how even advisory groups to the president could help.
You voluntarily put yourself in this position with no good way of fixing it. No one's forcing Facebook to do what they (and now you) do, eh?
My perception of reality is that you and your brilliant data scientist are (at best naive and unsuspecting) patronizing arrogant jerks who have no business making these decisions for your users.
You captured these peasants' minds, now you've got a tiger by the tail. The obvious thing to do is let go of the tiger and run like hell.
- User-configurable and interpretable: Enable tuning or re-ranking of results, ideally based on the ability to reweight model internals in a “fuzzy” way. As an example, see the last comment in my history about using convolutional filters on song spectrograms to distill hundreds of latent auditory features (e.g. Chinese, vocal triads, deep-housey). Imagine being able to directly recombine these features, generating a new set of recommendations dynamically. Almost all recommendation engines fail in this regard—the model feeds the user exactly what the model (designer) wants, no more and no less.
- Encourage serendipity: i.e. purposefully select and recommend items that the model “thinks” is outside the user’s wheelhouse (wheelhouse = whatever naturally emerging cluster(s) in the data that the user hangs out in, so pluck out examples from both nearby and distant clusters). This not only helps users break out of local minima, but is healthy for the data feedback loop.
If you restrict yourself to 2 bad choices, then you can only make bad choices. It doesn't help to label one of them "artificial" and imply the other choice isn't artificial.
It is, in fact, not just crude but actually quite artificial to measure likelihood to interact as a single number, and personalize the list of discussions solely or primarily based on that single number.
Since your chosen crude and artificial indication turned out to be harmful, why double-down on it? Why not seek something better? Off the top of my head, potential avenues of exploration:
• different kinds of interaction are weighted differently. Some could be weighted negatively (e.g. angry reacts)
• [More Like This] / [Fewer Like This] buttons that aren't hidden in the ⋮ menu
• instead of emoji reactions, reactions with explicit editorial meaning, e.g. [Agree] [Heartwearming] [Funny] [Adds to discussion] [Disagree] [Abusive] [Inaccurate] [Doesn't contribute] (this is actually pretty much what Ars Technica's comment system does, but it's an optional second step after up- or down-voting. What if one of these were the only way to up- or down-vote?)
• instead of trying to auto-detect party affiliation, use sentiment analysis to try to detect the tone and toxicity of the conversation. These could be used to adjusts the weights on different kind of interactions, maybe some people share divisive things privately but share pleasant things publicly. (This seems a little paternalistic, but no more so than "artificially" penalizing opposing party affiliation)
• certain kinds of shares could require or encourage editorializing reactions ([Funny] [Thoughtful] [Look at this idiot])
• Facebook conducted surveys that determined that Upworthy-style clickbait sucked, in spite of high engagement, right? Surveys like that could be a regular mechanism to determine weights on interaction types and content classifiers and sentiment analysis. This wouldn't be paternalistic, you wouldn't be deciding for people, they'd be deciding for themselves
I feel like this is a false presentation of the PM choice. If I was the PM there, I would question the first assumption that the users want to see more of the stuff they interact with. That's an assumption, it's not founded in any user or social research (in the way you've presented it).
And even if it was supported by research, I would think about the long tail. What does this mean for my user engagement in the long run. This list might satisfy them now, but it necessarily leads to a narrowing down of the content pool in the long run. I would ask my marketing sciences unit or my data science unit, whatever I have, to try to forecast or simulate a model that tells us what would the dynamic of user engagement be with intervention A and intervention B.
I feel this is one of the biggest problems of program management today. Too much reliance on short-term A/B testing, which, in most cases, can only solve very tactic problems, not strategic problems with the platform. Some of the best products out there rely much less on user testing, and much more on user research and strategic thinking about primary drivers in people.
If you were to use this approach - you might see that actually, the product you have with choosing to optimise for short-term engagement brings less user growth and less opportunity for diverse marketing - which, it is important to note, is one of the main purpose of reach-building marketing campaigns.
I would say the way this whole problems is phrased shows that the PM, or the company indeed, is only concerned with optimising frequency of marketing campaigns, rather than the quality, reach and engagement with marketing campaigns.
Obviously, hindsight 20/20 and generals after battle and all that. I'm still pretty sure I would've thought more strategically than "how do I increase frequency of showing ads".
As a PM, I'd support it as an A/B test. Show some percentage of your users an increased level of posts from the opposite party, some others an increased level of posts from their own party, and leave the remaining 90% alone. After running that for a month or two, see which of those groups is doing better.
They've clearly got something interesting and possibly important, but 'interaction strength' is not intrinsically good or bad. I would instead ask the researcher to pivot from a metric of "interaction strength" to something more closely aligned to the value the user derives from their use of your product. (Side note: Hopefully, use of your product adds value for your users. If your users are better off the less they use their platform, that's a serious problem).
Do people interacting with posts from the opposite party come away more empathetic and enlightened? If they are predominantly shown posts from their own party, does an echo chamber develop where they become increasingly radicalized? Does frequent exposure to viewpoints they disagree with make people depressed? They'll eventually become aware outside of the discussion board of what the opposite party is doing, does early exposure to those posts make them more accepting, or does it make them angry and surprised? Perhaps people become fatigued after writing a couple angry diatribes (or the original poster becomes depressed after reading that angry diatribe) and people quit your platform.
Unfortunately, checking interaction strength through comment word counts is easy, while sentiment analysis is really hard. Whether doing in-person psych evals or broadly analyzing the users' activity feed for life successes or for depression, you'll have tons of noise, because very little of those effects will come from your discussion board. Fortunately, your brilliant data scientist is brilliant, and after your A/B test, has tons of data to work with.
They did as you say (you are a PM, after all!), and next week they rolled out the "likelihood of engagement" model. An independent analysis by another team member, familiar with the old model, confirmed that it was still mostly driven by politics (there is nothing much going on in Elbonia, besides politics), but politics was neither the direct objective not an explicit factor in the model.
The observed behavior is the same: using the new model, most people are still shown highly polarized posts, as indicated by subjective assessment of user research professionals.
We used newsgroups and message boards long before Facebook. They weren’t as toxic, I’m assuming due to active moderation. The automated or passive or slow moderation is perhaps the issue.
I think they weren't as toxic because content creators didn't realize divisive content drives much more engagement. It's not about moderation, it's a paradigm shift in the way content is created.
In regards to a predictive model and privacy/ethics/etc, regardless of your objective function and explicit parameters a model can only be judged on what it actually predicts, thus it is enough to answer the prior question to be able to answer this.
This is because of the fact that machine learning models are prone to learn quite different things than the objective function intended, hence the introduction of different intent or structure of the model must be disregarded when analysing the results.
To any degree the models predict similarly, they must be regarded as similar, but perhaps in a roundabout way.
Agreed, as a general rule I shy away from predicting things I wouldn't claim expertise in otherwise. This is why consulting with subject matter experts is important. Things as innocuous as traffic crashes and speeding tickets are a huge world unbeknownst to the casual analyst (the field of "Traffic Records")
I would take a step back and question the criteria we are using to make decisions. “Engagement” in this context is euphemistic. This startup is talking about applying engineering to influence human behavior in order to make people use their product more, presumably because their monetization strategy sells that attention or the data generated by it.
If I were the PM I’d suggest a change in business model to something that aligns the best interests of users with the best interests of the company.
I’d stop measuring “engagement” or algorithmically favoring posts that people interact with more. I’d have a conversation with my users about what they want to get out of the platform that lasts longer than the split second decision to click one thing and not another. And I’d prepare to spend massive resources on moderation to ensure that my users aren’t being manipulated by others now that my company has stopped manipulating them.
I think the issues of showing content from one side of a political divide or the other is much less important than showing material from trustworthy sources. The deeper issue, which is a very hard problem to solve, is dealing with the fundamental asymmetries that come up in political discourse. In the US, if you were to block misinformation and propaganda you’d disproportionately be blocking right wing material. How do you convince users to value truth and integrity even if their political leaders don’t, and how do you as a platform value them even if that means some audiences will reject you?
I don’t know how to answer those questions but they do start to imply that maybe “news + commenting as a place to spend lots of time” isn’t the best place to expend energy if you’re trying to make things better?
I would think engagement would be a core metric you would be measured against in this example. And if that’s the case, this certainly isn’t a side effect.
Consider the following model scenario. You are a PM at a discussion board startup in Elbonia. There are too many discussions at every single time, so you personalize the list for each user, showing only discussions she is more likely to interact with (it's a crude indication of user interest, but it's tough to measure it accurately).
One day, your brilliant data scientist trained a model that predicts which of the two Elbonian parties a user most likely support, as well as whether a comment/article discusses a political topic or not. Then a user researcher made a striking discovery: supporters of party A interact more strongly with posts about party B, and vice versa. A proposal is made to artificially reduce the prevalence of opposing party posts in someone's feed.
Would you support this proposal as a PM? Why or why not?