Hacker News new | past | comments | ask | show | jobs | submit login
APIs for content sites must be free (somebits.com)
319 points by NelsonMinar on June 14, 2023 | hide | past | favorite | 293 comments



I've found myself resisting posting comments on Reddit recently - things like answers to simple questions people have about SQLite - partly because I don't want to cross the picket line but also because I realize that I was relying on a very light social contract that was in place there: I would share my knowledge for free, in exchange for which I knew that I was contributing to a larger dataset that myself and others could get back out again.

If Reddit are cutting off free API access, that social contract no longer holds. Why should I work to benefit their service if they're hoarding the resulting data and not making it available to me or people like me in the future?

I thought I was vanishingly rare in caring about this kind of thing, but given the mass Reddit blackout over the changes to the API apparently I'm not!


Reddit not long ago announced a plan to delete all comments from a submission whenever an OP deleted their submission (ordinarily only the OP's post would be deleted).

There was quite a lot of pushback since the consensus was many comments aren't just for the benefit of the OP but everyone (in fact sometimes the OP doesn't even appreciate the comments they receive but are nevertheless useful to others).

So Reddit walked it back but it does show many of the site's users are conscious of the broader utility of comments outside of just the smaller contextual interactions.


> Why should I work to benefit their service if they're hoarding the resulting data and not making it available to me or people like me in the future?

This I believe is generally called "digital sharecropping" https://www.roughtype.com/?p=634

> What’s being concentrated, in other words, is not content but the economic value of content. MySpace, Facebook, and many other businesses have realized that they can give away the tools of production but maintain ownership over the resulting products. One of the fundamental economic characteristics of Web 2.0 is the distribution of production into the hands of the many and the concentration of the economic rewards into the hands of the few. It’s a sharecropping system, but the sharecroppers are generally happy because their interest lies in self-expression or socializing, not in making money, and, besides, the economic value of each of their individual contributions is trivial. It’s only by aggregating those contributions on a massive scale – on a web scale – that the business becomes lucrative.



Thanks for articulating what I meant by the social contract, Simon. I feel the same way, which is why I wrote the post.

General reply to other comments: y'all are right, the post is short and doesn't flesh out all the arguments necessary. But I think I captured the spirit of something about how we, the creators on the Internet, feel about the bargain we strike when we invest in sites like Reddit or Twitter or indeed, Hacker News here.


Really? The average discussion on Reddit/StackOverflow/etc is full of wrong answers and it is hit or miss if the "accepted" answer is correct, never mind any of the other ones. These sites are a great model for convincing bullshit but if you wanted something worth training an A.I. you'd need to filter out 90% of the content.


My experience is that it is a good source for leads on things to do but not for convincing facts. And an LLM is good for that too. I still end up needing both.

An example of something is someone posting that their AKG headphones will only charge on the USB-C plug if the other end is USB-A. I couldn't get mine to charge, and that was the answer. That answer existed only on Reddit, until recently when someone had that same problem on another forum.


Sadly very true. Most threads are very comparable to ChatGPT 3.5. I think one could recreate Reddit mostly with a LLM.


Reddit probably was a pretty big part of LLM training data. It has a vast trove of conversation on niche topics, and pairings of "what is said in response to input X". Probably the monetisation of the API is an attempt to capitalize on those users.


Fortunately /r/SubredditSimulator has been a thing for a while https://www.engadget.com/2019-06-05-subreddit-simulator-gpt-...


Yea, the future of public discussions online is interesting to consider. LLM bots are already a thing, comments posted here could be from a bot. We don't have a way to really confirm it, and I am not sure it actually matters (yet).


I'm making a concerted effort not to open reddit, and as a result am spending more time doing side project things.

One such project has been modding an old video game console. On multiple occasions when running into issues, search results are pointing me to reddit posts which I've tried to avoid in favour of alternative results but I find myself still needing more information only to discover the links are to blacked out subreddits - it has been brutal!


I'm in a similar boat. Thankfully you can often use an internet cache to view the contents (this doesn't help Reddit out, which would defeat the purpose of the protest).

I've done this for a few coding things that I needed an answer fast. It does show how much will be lost if things don't improve, but like you I didn't want to fuel Reddit in any way (assuming the sub is even open anyway).


I feel very similarly. I have mixed feelings on my actions but I just finished running shreddit on all of my comments and deleting my accounts. I wrote content with the (possibly misguided) notion that what I contributed would continue to be openly accessible/searchable. I didn't want what I wrote to end up in a locked up walled garden like Meta's products. Imho Reddit will continue to put walls up around user contributed data. I decided to delete everything while I still could.


Can someone explain to me why Reddit can't have different price tiers for their API service? Why can't they charge people who want to scrape en masse (LLM trainers) 10-100x more than mobile app API users?


Absolutely nothing.

I don't understand why they don't just support OAuth and then the api usage would be by user, not by the app itself.

Otherwise apps like apollo technically would be some of the biggest "users" of the api.

What's wild to me isn't just the amount, it's the lack of time. In theory APollo might have enough users will to pay enough for them to profit (albiet with a fraction of the users). But because there's no time, and because people bought year long subscriptions, he's stuck with his subscribers paying less then their operating cost.


I figured because they don't want 3rd party apps because they don't run ads for Reddit. Reddit wants users within their (soon to be) walled garden getting fed all of the monetization pressure systems.

My take when Reddit said that Apollo cost them $20M a year, they weren't talking running costs but revenue potential because Apollo doesn't feed ads to its users.


I don’t understand, if API apps not displaying ads is the problem, why Reddit doesn’t include ads in the feed and make displaying them part of the terms for access. They could even have a revenue sharing agreement with the app developers.


This implies that the Reddit administration thinks things through. They have a track record proving they indeed do not.


I wonder how we've ended up in a situation with so many petulant tyrants running amok over previously green fields of humanity?

Elon, Zuck, Bezos, Trump, Gates, Putin and now Reddit, all in the last 5-7 years have been doing things that are user or human hostile while holding some segment of our society hostage in the process.

Elon throwing a 42 billion dollar Twitter hissyfit, Bezos using Amazon's money and connections to lock sellers into their platform and then undercut them with cheap knockoffs, plus underpaying their front line employees as if they don't have enough money already, Trump with turning seemingly normal Republicans into walking cesspools of hate, Gates with the Windows 10/11 privacy and user right shitstorm, and now Reddit redlining their greed and attacking the users and apps that actually makes their site function while not offering any equivalent alternative, Putin tanking the entire country's human reserves and financial strengths to try to vaingloriously steal back a few hundred square miles of land.

All of these things happening in the same bundle of years makes me want to believe in Aliens or Armageddon or the Illuminati or something, because I can't bring myself to believe that the most powerful people in the world would be so selfishly stupid.

Selfish, yes, stupid, yes, but selfishly stupid? When they are surrounded by so many people whose life and lineage and future history depends on the glamorous and glowing successes of these people? And they all ignore all of them and do stupid things with no one checking their ego even the slightest?

How is that possible without an unseen third hand pulling the strings?

I don't know, and I don't think I believe there is something like that, but I suspect there is, and I can't shake that suspicion either.


AFAIK Gates has almost nothing to do with Windows these days, but yeah I see your point.


Hopefully someone has the link, but I heard that the lost revenue from ads is significantly lower than the $20 million bill.


> I don't understand why they don't just support OAuth

OAuth breaks peoples brains and I can only imagine the tech / political debt reddit has


Isn't this why people are upset? It's pretty transparently them giving the axe to third party apps when they explicitly said that wasn't what they were going to be doing. People don't like being lied to or treated like they're too stupid to know what you're actually doing.


That’s exactly how it works today, Apollo logs into Reddit as you via OAuth. When you add an account you follow a pretty standard OAuth flow (Apollo sends you to Reddit, you log in, Reddit asks you to authenticate Apollo, then sends you back). But Reddit knows it’s you, and they know you’re using Apollo.

In fact heres the post where Apollo switched to OAuth some years ago: https://old.reddit.com/r/apolloapp/comments/44rd9g/apollo_be...


They don't want 3rd party mobile apps. That's it. It's entirely transparent, to the point that I really don't understand why Reddit doesn't just say it.


Because why would anyone pay that, when you can just do what search engines do and crawl the site for the data?


Same. I usually spend a few hours a week helping people with 3d printer issues and since all this has blown up I have just ignored it all. Not worth putting that knowledge into Reddit and toying with deleting that which I have contributed already.


It would be much better if this knowledge was available elsewhere for other souls to peruse. Say, maybe, a blog?


I do have it cross-posted to my blog actually, and have my downloadable profiles there.


Same, I wish we could contribute content (posts, comments, votes, etc) in a redistributable way.

I'd like to a P2P-backed service to exist, where people can own all the data. I guess that abuse would be a challenge, but there's probably enough good ideas around bittorrent and cryptocurrency to make sharing and trust/voting possible.

My dream is that people could have a ~raspberry at home helping host the internet in a sparse way. Would it be more inefficient than a centralized service? sure, but decentralization is inherently more expensive, yet better thanks to true shared ownership.


Inefficiency was always a red herring.

$ curl https://www.reddit.com > /dev/null

    % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                   Dload  Upload   Total   Spent    Left  Speed
  100  970k    0  970k    0     0   840k      0 --:--:--  0:00:01 --:--:--  840k
That's just the index, not images or anything pulled in by scripts. That page takes longer to download on a gigabit internet connection than the <5kB page it ought to be could be uploaded from some asymmetric 10Mbps up cable connection.

The biggest problem is actually funding. The centralized site puts ads on it or tells VCs they could, and then has money to pay for developers or other things that increase the user base.

It's the same reason Linux is used by programmers and not graphic designers. If there is anything deficient about it in the programmer's workflow they fix it for everybody. If there is anything deficient about it in the graphic designer's workflow, they don't know how to fix it so they get a Mac and run Adobe. Social media sites work more like the latter than the former until someone finds a way around that.


I'm not too concerned about efficiency either. Growing with slow and spotty rural internet taught me the hard way that data locality is key.

Also, with enough data, having to get only what you care about is probably faster.


The funny thing to me is this sort of expectation that a site/app that accepts user contributed content must provide access to it other than the site/app it was submitted.


I assume that most sites won't do that - so I don't invest much of my time and effort in most sites.

Reddit had 17 years of track record of providing access, which had eventually bought my trust. They're setting that trust on fire right now.


I stipulate that a 30 day notice for a major change from something that has been available for 17 years is pretty egregious. However, I'm still of the mind that when you build a company/product that 100% depends on a 3rd party existing or allowing one to play in their playground, these are the exact situations that can arise. You have to know that at some point, if the dependency is no longer available, neither is your product regardless of the reason for the lack of availability. That's just how it goes. Yes, it sucks to have the product of your blood, sweat, tears get invalidated, but that is where you chose to put that effort while knowing the caveats


Sure, and that's why I haven't built a commercial product on top of Reddit.

(I actually did run a startup on top of Twitter's API between 2010-2013 - I'm VERY glad not to have to worry about that any more!)

I'm not talking about this in terms of being an app developer: I'm talking about it in terms of being a regular user posting content on the site.

If you rug-pull my expectations on access to that data in the future, don't be surprised if I find somewhere else to spend my time.


Sorry, with all of the recent Reddit 3rd party apps threads, I've crossed the streams with your case of just submitting content rather than being a 3rd party app


Ignore the app developers for a second.

What reddit is doing is horrible for the users. Reddit benefitted massively from 3rd party apps making it easier for users to contribute, comment, moderate, etc. Without those users' efforts the platform is worthless. That is the issue here.


So are you saying that because I submitted a lot of reddit comments through Apollo I should also be able to access them through Apollo? Because that's a statement I would agree with but I'm not sure it's what you meant.


On one hand I understand not wanting to contribute to a commercial content aggregator that you feel doesn't return enough value to its users, but it sounds like you still use it, i.e. the more restrictive API pricing does not discount its value enough for you to discontinue your use of the site.

As far as refusing to contribute, I don't see the practical difference between that and refusing to help someone in trouble out in the street. You get even less for your trouble in the latter case, but I'm sure you wouldn't demand some value back for answering a question in real life even to a stranger you'll never meet again. Sure, a middleman is profiting off of your answer here, but getting hung up on the share of value is how so much of the web shifted from free collaborative passion projects to for-profit Patreon paywalls and the like.


> things like answers to simple questions people have about SQLite

What about just to answer the guy who had the question?


This makes the idea that sites like Reddit only live and thrive because of user generated content and they're nothing without it, reverberate all across the room.

Looking at what Reddit has done, it makes things obvious to me that the APIs that are free to use are only a gimmick and companies can pull the plug on you as a consumer at any moment. I mean, I can easily imagine that Reddit originally published its API to look "cool" but look at what money has done today.

Why could you not make an offer to buy out Apollo or other 3rd party clients before you closed out your API Reddit? Fuck your leadership and the so called idiots on your management team that pulled ideas straight from a dumpyard.


I blocked reddit in my hosts file and have found my news through other means. Here and fark.com lol


Same on my local PiHole =D

It's worked the couple times I instinctively sought reddit advice.


Same for me on NextDNS


Not to worry, you can still search google/etc., which scrapes Reddit, or Reddit itself through its native search. No need to use an API explicitly to access content like that.


In my googling this week by far the vast majority of promising reddit links for my searches have resulted in the "r/[subreddit] will go dark..." message instead of the discussion. Usually only the first reddit result has the "Cached" link available. I suppose there's probably a way to force Google to use the cache for other links but knowing that there's a cached link available is already something few regular users are aware of.


A little inconvenient but you could Ask/answer it on stack overflow and then link it in the reddit reply?


Stack Overflow is under a moderator strike at the moment too! The company has disabled the data dumps of the content after a recent layoff.

https://meta.stackexchange.com/questions/390106/moderation-s...


I predict that all sites like this which thrive on community generated content will be getting rid of apis and bulk exports. This is because ai companies have used those tools to build their systems on data that prevents the need to the source website. This is an existential threat to sites like StackOverflow and a huge, missed opportunity for reddit.


AI companies can just scrape the content, and since the sites don't have an exclusive license to it (it's user-generated, after all) there is little they can do against that.

Actually, I'm not even sure the AI companies don't do this already. If I'd start such a company, I'd invest in a generic crawler once, instead of building countless integrations for all the different, proprietary APIs or dump formats.


Please explain to me where Reddit is no longer free for Joe Consumer?

It's the subreddit mods who are making the data inaccessible to you, NOT Reddit.


The people working for free are the villains, not the multibillion dollar corporation.

Got it, thanks.


Are they a multibillion dollar corporation though? I know they have had crazy valuation's, but we are moving past a time of value based on eyeballs and to a time of value based on monetization. Where does the monetary value of reddit really lie? It seems like most of their creative monetization strategies like gold, donations and custom avatars have failed. This leaves the more mundane avenues of advertising to users, and selling the content that users creates. Both of those are severely handicapped by a free/low cost api. If they cannot monetize a user or their data then that user is bad for their bottom line.

reddit is doing this because their investors want them to be a multi-billion dollar company, not because they already are.


"It's the striking coal miners who are making heat inaccessible to you, NOT the mine owner"


Imagine comparing coal miners to the equivalent of internet janitors who volunteer to work for free as long as they get their share of petty power.

Tons of them are basically just protesting since they won't be able to exert said petty power as efficiently as they used to.

I can't imagine getting to a point where I'd defend Reddit powermods, even over spez himself. But even in your very shaky analogy, the mods are at worse the petty bourgeois, not some sort of proletariat of Reddit wtf.


How are they bougie? They have no ownership over the means of production or financial stake in the success of Reddit. If Spez IPOs Reddit to the moon, do the mods get a taste? At most (from a class perspective), they are the thin layer of floor management that the site relies on the function - those people are still prole labor.

Yes, yes, Reddit is no where near as bad as a coal mine, that's why it's an analogy.

> I can't imagine getting to a point where I'd defend Reddit powermods, even over spez himself.

Why not? Workers vs. Capital, if you can't side with the workers, then you are siding with the moneyed interests, and I can't figure why an average person would do such a thing.


I'm not siding with the admins, to be clear. I really really dislike how Reddit has been run. Always been a shithole, one way or the other.

But I also think that powermods are a net negative for the website and have turned moderation on Reddit into a joke.

At the end of the day I think users are the "proles" here. And most of them just want to use Reddit for things like tech support or purchasing advice or discussing video games.

That a few mods are able to take ownership of communities that aren't theirs (the community is the users, not the mods imo) and close them down is pretty meh.

I personally would never use Reddit once RiF stops working (and I mean it, I can't even get myself to use the old Reddit interface).

But this is a weird temper tantrum imo that is typical of Reddit; a very tiny minority of third party apps users and even commercial third party app devs shutting down stuff that isn't theirs because they dont have it their way. Again, basically 90% of users just use the new Reddit app and website sadly enough.


It will no longer be free to browse how you like. You will be forced to use Reddit's Apps, Reddit's website, etc. Along with the tracking and restrictions that they impose (Future removal of NSFW is an example of a restriction).


> I was relying on a very light social contract that was in place there

This was never a social contract so I actually wonder how you or the original blog poster ever got to that idea. In fact, it's explicitly not the social contract if you ever read the Terms of Service of sites like Twitter and/or Reddit.


You're thinking of a legal contract. A social contract is what two sides do in practice in order to not piss the other off.


I know exactly what a social contract is, and it's always superseded by a legal contract (which the ToS sort-of is, though it would have to go through the courts if challenged). Given that the ToS explicitly says that Reddit et al. can do whatever they want with your data, the idea that there's a social contract there is wishful thinking.


Uhh... I really don't think you do know what a social contract is. A social contract is a temporary state of affairs, where both sides are benefitting from being able to implicitly hold the other side to it; where either side can freely breach it, but where this dissolves the social contract that held until that moment; and where dissolving the social contract will (obviously) cause the other side to rescind all benefits they were granting the other, and instead revert to whatever it perceives as the game-theoretic global optimum — usually involving local indefinite defection, to "make an example of" the party that violated the social contract.

A ceasefire during a battle is an example of a social contract. Neither side shoots until the other does. Once someone shoots, the other side responds with everything they have, to make an example of the side that shot first.

Not eating your coworkers' food in the office fridge is another example of a social contract. Once someone violates it, people either stop putting food in the fridge, or start petty games of bringing random leftovers dosed with ex-lax to work and leaving them in the fridge.

A social contract isn't "superseded by a legal contract"; a social contract is relied upon in preference to a legal contract. The legal system only applies when a given social contract is broken and dissolved by one party breaching it, thus leaving no recourse other than the underlying legal reality.

The social contract for Reddit, that did exist until it was breached, was "Reddit lets people consume its data for free; and in exchange, users act to curate and maintain and grow the data, rather than actively polluting/destroying/lessening the data." Now that that social contract is dissolved, users are doing everything they can to make Reddit a shithole.


That's exactly what I meant by a social contract, thanks for clarifying it so well.


I have these thoughts about the homeless problem.


Can you elaborate?


> A ceasefire during a battle is an example of a social contract.

This is absolutely false. A ceasefire is not a social contract. It's just bizarre to even claim this, as ceasefires are most often enacted in the context of ceasefire.. uh, treaties.

> Not eating your coworkers' food in the office fridge is another example of a social contract.

This is a better example (though still a very poor one, as we have laws against theft), and the key here is that a social contract is implied or implicit. Why would free API access be an implication of using reddit's service? Almost no other social media website does that.


International law in general is more like a social contract than a legal contract since there is very limited recourse to enforce it against violators other than rounding up a bunch of other countries and getting them to agree to enact a punishment.


"Why would free API access be an implication of using reddit's service?"

Because they offered free API access for 17 years.

Same way Heroku breached a social contract when they got rid of their free tier: if you offer something like that for more than a decade, people will reasonably expect it to continue and will be angry with you if you pull the rug on them.


> Same way Heroku breached a social contract when they got rid of their free tier: if you offer something like that for more than a decade, people will reasonably expect it to continue and will be angry with you if you pull the rug on them.

That's not a social contract, it's the definition of entitlement. By no interpretation is it reasonable to expect anybody to offer a free service indefinitely. Thinking otherwise is just a form of social parasitism that strips the host of its own agency, because they were dumb enough to make a handshake agreement decades ago.

Like decrying your favorite band for "selling out" and evolving with the times. You'd rather they only make music a diminishing fanbase wants, until they starve?

Even child support agreements can be renegotiated every 3 years and last up to 21 years at most if your ex's attorney is particularly conniving. You likely renegotiate your own salary every 1-3 years as well. Reddit stuck to their end of the agreement for 17 years (already impressive for anything in tech!), it's now untenable, but fuck them for not going insolvent.

Only the generations raised with the internet seem to have the expectation that if you offer anything for free, you're expected to continue doing so forever even if it kills you.


You’re thinking about standard ceasefires. I’m not the parent posters, but the Christmas truce of 1914 was done IN SPITE of orders to keep fighting between both sides.

I’m guessing that was what they were referring to.


> as ceasefires are most often enacted in the context of ceasefire.. uh, treaties.

Also known as "armistice treaties." (Where "armistice" is preferred, because otherwise "ceasefire" would have two very different meanings.)

Yes, an armistice usually causes a ceasing... of fire... at the site of any battles that happen to be going on at the time. But you wouldn't usually call what happens on the front lines as a result "a ceasefire." You'd call it "the battle being quit by both sides." During an armistice, both sides usually pack up and go back to their respective bases to be held for redeployment.

"A ceasefire", when the term actually comes up in practice in relation to "phenomena you can experience during a battle", does not usually refer to the ceasing-of-fire that exists as part of the enactment of an armistice. Rather, a ceasefire is used mostly to refer to a temporary, tactical Nash-equilibrium state that lasts a few hours or days; is local to a particular battleground; and is enacted bottom-up by the boots on the ground just not shooting anybody for a while.

(A ceasefire is usually considered an example of a principal-agent problem — during a battle, the known Commander's Intent is to drive your opponent to surrender. A ceasefire does not accomplish this; in fact, it gives both sides time to replenish materiel, and thereby "gives ground" on a war of attrition. But a ceasefire is a state that's better for everyone who's actually there at the site of the battle — so it can be held to by social contract of the people who are actually there.)

Mind you, the phenomena of "a ceasefire" doesn't exist much any more in its original usage (land wars), as land wars have evolved past the point where ceasefires are feasible, due to the introduction of over-the-top raids, guerilla warfare, precision drone strikes, and other approaches designed to punish stillness and allow top-down enforcement of bottom-level participation. Ceasefires still come up pretty often in naval combat, though! (And they can also still come up in land wars in edge-case situations when a particular land-war battle has the properties of a naval battle — e.g. if two strike forces of armored vehicles meet.)

---

I should mention, as well, that there can be ceasing-of-fire during an armistice, where both sides continue to just sit there guarding the lines they had established for the whole armistice period. This is the case mostly for wars of conquest being fought to claim scraps of land inch by inch. (There's a well-known one of those going on right now, in fact.)

During an armistice in this kind of war, the zone between the previous front lines of each side will stay watched, to make sure neither side "slips" the lines forward a bit in the other's absence. This creates a No Man's Land [or eventually a Demilitarized Zone], where both sides agree that no one from either side may enter as long as the armistice holds. (A good example of such a No Man's Land / DZ exists between North and South Korea, which have maintained such an armistice for decades now.)

The soldiers posted to guard a No Man's Land / DZ aren't usually the same soldiers used for combat — so for the soldiers who were on the front, they do still experience the armistice as "and then we were picked up and taken back home." The ceasing-of-fire is just one step in that higher-level "clocking out" process.

(The only time you'll have the same soldiers sitting there during an armistice that were originally delivered there to fight the battle, is if the armistice period isn't expected to last very long. And even then, only if both sides are willing to "tip their hand" to reveal their lack-of-confidence in how long the armistice will last.)


> I know exactly what a social contract is, and it's always superseded by a legal contract

Legally sure, it is also legal to be a jerk but in most circles/societies there is social contract to not be. Social contracts are often built on top of legal contracts where you live by the social contract and fall back on the legal protections of the legal contract where that fails. Most ToSs and contracts with social media companies are one sided though so for the average user there is not much to fall back on.

I think why people use social contracts on top of legal ones is because it is hard to get all of the details right in a legal document while in a social contract is normally more flexible and can change through time with iterative back and forth and therefore can make both sides better off compared to if they just stuck with the legal framework.


> wishful thinking

It remains to be seen who is thinking wishfully. If Reddit dies because their users all go away, there really may have been a social contract which Reddit breached. Reddit has a ToS which may attempt to supersede such a social contract but it doesn't necessarily.


I get what simonw is saying though.

Reddit: We could sell API access to your answers, per our TOS

reddit poster: k

Reddit: We're gonna sell API access to your answers, per our TOS

reddit poster: then nevermind

Both have a superseding TOS but the social contract is still appreciable


The legal contract states that Reddit can do what it wants with the data, yes, but there is no consideration for production. That is where the social contract takes over.

The social contract states (allegedly, at least; we shall see) that producers agree to produce content in perpetuity so long as Reddit plays nice with the data. With that contract broken, production may cease.


There’s still a social contract that is supra the ToS: treat me and my contributions well and we will stick around. Stop, and we won’t.

Yes, Reddit can (try, at least) get away with anything they claim in the ToS — but community members have every right to step away and render Reddit valueless because they have been told that they are not valued.

Pretending that a legal contract supersedes a social contract is wishful thinking. They are orthogonal.


You are thinking of a social agreement often called a “handshake agreement” or “verbal agreement”. Yes a verbal or social agreement at law is often superceded by a written agreement mainly because there’s a specific common clause in written agreements that say exactly this.

The above legal concept is distinct from the term “social contract” that is a sociological term and has a wider definition that a mere legal meaning. Most of our laws are intended to capture concepts that have wider social and cultural relevance to the society to which they apply.

“Marriage” by example is a term with both a distinct legal meaning in many legal jurisdictions as well as a wider sociological conception that can vary from culture to culture.


I think you're confusing the term "social contract" with "oral contract." An oral contract will usually be superseded by a written contract, but saying a written contract supersedes a social contract seems like a category error. One determines your legal standing, the other determines your social standing, neither really supersedes the other.


I like vehement arguments even when I disagree with them.

Based on this man's logic, if I were to run a web forum then I must provide an API or data dump to download the posts created by the users? How about a comment section on my blog? How about a community Discord server? Why not Facebook posts? Or Instagram stories?

I often get the feeling that in these circumstances people derive first principles from specific desired outcomes. Some people strongly desire that the content that has been aggregated on Reddit should be available to them. They then attempt to invent moral axioms that they believe will lead to that outcome. They also seem to reinvent history such that there was some "implicit" agreement between Reddit and the users of Reddit.

IMO, just because I wrote some shitposts on Reddit does not mean that I am entitled access to every single shitpost that has ever been written on Reddit.


> Based on this man's logic, if I were to run a web forum then I must provide an API or data dump to download the posts created by the users? How about a comment section on my blog? How about a community Discord server? Why not Facebook posts? Or Instagram stories?

Yes? I don't think it's a very radical, unreasonable or difficult thing to provide: this is available as a first class feature on most blogging/forum software (RSS feeds) and wasn't even remotely controversial 10–12 years ago.

> They also seem to reinvent history such that there was some "implicit" agreement between Reddit and the users of Reddit.

There was an _explicit_ agreement: the site's terms of use have allowed free API access with reasonable quotas for 15 years, and are being changed now with very short notice. Of course the terms of service can change, but users are also free to leave, complain, or demand that they don't change. There are also strong cultural factors involved. For example: just like in the US it's customary for tap water to be free at restaurants, and people would be probably angry if you decided to charge for it, on the open web it's customary for some reasonable level of API access to be free.

> IMO, just because I wrote some shitposts on Reddit does not mean that I am entitled access to every single shitpost that has ever been written on Reddit.

There's no such asymmetry in scale: _everyone_ wrote the content, _everyone_ is entitled to access it. If you're referring to incredibly resource intensive, mass-access to download several terabytes for AI training I can see how that should cost a price. But they're blocking reasonable use-cases.


>Yes? I don't think it's a very radical, unreasonable or difficult thing to provide: this is available as a first class feature on most blogging/forum software (RSS feeds) and wasn't even remotely controversial 10–12 years ago.

I think demanding someone pay server bills so you can freely access data they have stored on their hardware (to make yourself money) is radical and unreasonable in every way. If it is an education service or something I can see your point and may agree with you. But that is not what is happening. It is people using reddit's resources to make money for themselves.

Reddit has the right to want fair compensation for use of its service in a commercial manner.

We have the right to stop using reddit if we don't like it.

We have the right to run a service that does give free api access if we see it as reasonable.

We do not have the right to force reddit to provide us free access to their resources. That is completely unreasonable and would be no different than me going to you and saying you must give me access to all your credit cards and let me use them because I feel like I should have it.


You've hidden a subtle semantic shift here. This discussion is centered around personal use, yet you're drawing a line outside commercial use as if they were the same thing.

For instance, Askhistorians posts were all intended educationally and many of them were produced via API access, some long before Reddit had an app. Is it not reasonable to expect those to still be accessible to others the same way, even with a nominal fee to cover API maintenance costs?


Speaking of semantic shifts, "reasonable to expect" has nothing to do with obligation or whether people deserve anything. It's "reasonable to expect" that it will rain tomorrow, based on what I've seen on the weather maps and in the forecast. The rain has no obligation to me to fall.


"Reasonableness" is a decent way to describe the norms of a social contract. They're not legally enforced (reddit clearly has no legal obligations here besides), and 'unethical' feels like the wrong adjective. It's also the word grandparent comment used. What word would you choose instead?


>You've hidden a subtle semantic shift here. This discussion is centered around personal use, yet you're drawing a line outside commercial use as if they were the same thing.

Claiming this is about personal use is the semantic shift not what I said. You are trying to shift back to that because you know if you have a good faith discussion you have no legs to stand on. The reddit changes target commercial use plain and simple.

If you create an app that loads reddit content from its APIs for only you to use, these changes are going to have very little no effect on you. That app would be personal use and if you incurred charges for your API use they would be very small (or you are abusing or spamming the API which both are against its terms of use).

The most public part of this has probably been the Apollo dev vs reddit. Apollo is commercial. Using Apollo to browse reddit used the Apollo API key not your accounts API key. The crazy charges the Apollo dev listed are due to tens (hundreds) of thousands of reddit users using the Apollo API key to access reddit. That is not personal use. That is a commercial entity using its own API key to access reddit's API.


> If you create an app that loads reddit content from its APIs for only you to use, these changes are going to have very little no effect on you.

This isn't correct. The API restrictions are per client id, so your app must be limited to you specifically. Having different accounts that share a client id is the whole purpose of Oauth. It's entirely orthogonal to commercial use. The changes affect everything from moderation bots to non-commercial clients to archival services and everything else.


>This isn't correct. The API restrictions are per client id, so your app must be limited to you specifically.

It is exactly correct. I said you build an app for ONLY YOU TO USE. You can still use multiple reddit accounts that you control but no one else is using your API key. Or put another way, you register an API key and then grant that API key access to all your reddit accounts (using OAuth most likely). No other persons reddit account would be using your API Key.

Also that is not the whole purpose of OAuth. You have been able to attach multiple accounts to a single clientId for decades. There are still plenty of sites using SAML that have multiple user accounts tied to one clientId. In fact you can very simply do this today without OAuth. Make a table called clients that owns a table called users. Any user that logs in will have a record in the users table you look up using their userId and then look at the clientid or apikey attached to the parent client record. Then use the clientId/apikey to access any resources you need.

OAuth is delegate authorization framework. Its purpose was to give users the ability to give a system limited access to their data without giving that system their password. It allows you to seperate Authentication and Authorization. Here is a good link to learn about OAuth: https://developer.okta.com/blog/2017/06/21/what-the-heck-is-...


This is a false dichotomy - no one is protesting for a free lunch. The vast majority of app developers and users are okay with incurring a fee (and many solutions have been proposed - for example, users needing to subscribe to reddit's premium tier to use 3rd party apps) - reddit management has shown in their conduct (refusal to listen to users / potential customers, applying changes with a staggeringly short notice) that they are negotiating in bad faith.


So I would agree with about reddit management especially how they treated the Apollo dev but this thread is on a piece about why APIs for content sites should be free. It is not about reddit managements conduct or lack of common sense. There is still a free tier with reddit's API. It is rate limited to 100 request a minute per API key. So if you are using reddit for your own personal use, it is very rare you are going to need more than 100 apis requests a minute.


What if they don't want to be in that business?


> users are also free to leave, complain, or demand that they don't change

Yes, I have no issue with all of those. I'm poking fun at taking that desire for action from a single company in a single instance and extending it to a universal moral principle.

My own opinion is that Reddit users were getting something valuable for free and it is being taken away from them. In addition to the totally reasonable protests that they are making, some are taking it a bit far and supporting their tantrum by creating axioms of morality that I believe are a bit excessive.


Getting something valuable for free sounds weird in the physical world but it's not weird at all in the information world, where costs can diffuse so thinly that many users never even see them. I see no reason why a non-profit reddit couldn't be free if they didn't try to host video or images (links and comments only). Server costs could absolutely be covered by donations (like "reddit gold") and intra-site advertising (sidebar subreddit ads). Just don't hire many people, don't take VC money, and lean on your community for moderation (which is perfectly morally acceptable for a non-profit entity).

It's not freeloading, it's just a bunch of people dedicating so much time and attention to communicating with each other that the server costs are dwarfed by the value of the platform to its users.


> I see no reason why a non-profit reddit couldn't be free

"could be free" and "must be free" are such a moving of the goal posts that it is impossible to argue against. I mean, you didn't even go through "should be free" on your way.


I wasn't really trying to move somebody else's goalpost, but here's an attempt to bridge the gap anyway, at least as far as "should be free" ("must be free" follows if you're sufficiently authoritarian/utilitarian, otherwise it doesn't /shrug):

Network effects tend to pick extreme winners and losers. From most user's perspectives, there isn't much of a choice. I have an iPhone for a totally stupid reason: My family's chat is an iMessage group and they coordinate photo albums with Apple's shared albums thing. I use reddit for literally no reason at all except that other people use it.

I assert that it would be better for me and most to use an equivalent networking facility maintained by a non-profit entity, and that if there were some kind of cross-site replication such that you really could choose which option to use, a very large number of people would pick the option that didn't present inline or otherwise invasive advertisements, and (indirectly) the option that had free API access, via something like Apollo (software that's simply built better than its competitor on its OS).

So not only do I believe non-profit-reddit (hey, that's "NPR", complete with donation drives) could exist, I believe it would be better for its users and so it should exist. But I also believe it can't really exist as long as somebody is throwing a bunch of (indebted) money into winning the network effects war. So I believe that for-maximum-profit-reddit should not exist.

But I admit the belief feels empty since I'm not authoritarian enough to take it all the way to "reddit must not exist". So I'm just sad instead.


I know of one long-running message board site which is free of advertising and free for users. A number of users make regular donations and occasionally when funds are low they make an announcement and some other users make a one-off donation.

I am very doubtful their model would scale to Reddit's size, and certain they don't have any wish to.


I know it seems like it wouldn't scale merely because we haven't seen it happen, but I don't think it's economically impossible or even economically difficult. I don't think there's any reason to believe that your costs go up more quickly than your income from donations/ads. Instead, I think what happens is you just don't get many users in the first place because all their attention ends up acquired by people willing to operate non-sustainable businesses with the aim of extracting value in the future.

If by some crazy fluke there weren't any highly motivated money-brains in the social media space, I'm absolutely certain a sustainable non-profit open source solution would exist in place of reddit and facebook, and it would eventually accumulate nearly as many users as those sites have. Maybe not as quickly, and maybe while throwing up donations banners at the same rate as Wikipedia, but it would happen. But you need to get impossibly lucky for the entrepreneurs to ignore "the space" when you have 100k users, and by 1 million users the scent of "alpha" has drawn so many of the biggest, meanest, smartest of the bunch that you need to be an absolute juggernaut of momentum and name recognition (like craigslist and Wikipedia) to have any hope of competeing.


The one I have in mind has been going for decades. I don't think they ever wanted to scale up. The place feels like an independent bar that the owner has no interest in turning into a chain, he just likes running that one bar.


Metafilter? Something Awful?


Neither of those.


Well said!


You have the option to syndicate information from your site at your leisure.

You are not obligated to provide it to whoever wants it,and to cover the cost of providing or insuring it is available.

This is a radical notion.

As for the for everyone wrote it,everyone has access to it. That is daft, the access to the data is at the discretion of the service hosting it, they never made any claims they would make this data available to everyone anyway they wanted it?

If people don't like the terms of a site don't use it, don't think just because you used a service for a period of time you have some right to determine how that service is operated.


>if I were to run a web forum then I must provide an API or data dump to download the posts created by the users

It's implicitly provided by hosting the forum, blog, etc. at a publicly accessible URL. HTTP is an API, HTML is a data interchange format. There's this weird idea (which I've seen parroted both in this article, and in u/spez's discourse on the topic) that "scraping" is somehow evil and bad, when it's actually an intentional feature of the web. Scraping is quite literally how Google built its search empire. It seems to me, now that the major players are sufficiently centralized and entrenched, they want to turn scraping into a boogeyman so nobody can follow in their wake.

The advantage of providing a "real" API is to (a) limit scope narrower than an account/session, and (b) eliminate the overhead implicit in laying out the content for human eyeballs; reducing processing costs for both the server and client.

I find it somewhat hypocritical for Reddit et al. to provide the former for free, and charge for the latter, when the latter is explicitly designed to optimize their costs.


Ultimately this is about DRM again, or at least on the same spectrum of motives and behavior: The corporate middleman wants to be able to force restrictions on how consumers manage or view the content that is being produced by other users.

Either forcing you to run a proprietary closed-source client--like their official smartphone app--or else making the desktop rendering so gnarled that you pretty much have to view it exactly the way they intent while having their code pierce your privacy.

It's not just about making sure ads are displayed anymore, I'm sure there's some executive who has asked about preventing people from copy-pasting text...


I don't think your arguments accurately cover the OP's argument at all. The key distinction is literally the first sentence: "Social media businesses should not charge* for APIs."

If you have a community [whatever] service that would not be a business. If you have a blog, that's could potentially be a very business-adjacent, but I'd argue it doesn't cover the "social media" qualification, so that also doesn't apply.

If you run a business and much of your content is user-generated (because it's a social media site), the OP is arguing that API access should be free/at cost.


> OP is arguing that API access should be free/at cost.

Implicitly he is also arguing that it must exist at all.

But let's consider further your own commercial clause. Apple has a discussion board for their support community [1] - and now we are mandating that Apple must both have an API for users to access it and must only charge cost. Who gets to determine the cost? Is it hardware costs? Does it include R&D? Is there a fixed margin set by a regulator?

The grey area on this is as wide as an ocean. But I do think it is a little funny that people are arguing about this like it is equivalent to a universal human right.

1. https://discussions.apple.com/


A server-rendered HTML website is an API. If said HTML website is unauthenticated and freely accessible by any user-agent, then it's a de-facto public API, too. This is the case for the Apple discussion boards. You don't have to authenticate with their backend before scraping anything you'd like off of the site. You can build third-party tools to read or even interact with this site, by scraping the HTML.

This thing with Reddit is only the big deal that it is, because Reddit's backend blocks "app" user-agents from simply scraping pages from the non-authenticated Reddit HTML website. (If this wasn't true, they'd just do that, and none of this would be an issue.) But these third-party UAs are instead forced to go through the authenticated data API — where they can then be API-credit-limited and forced into paid data-API subscription plans.


I think a more reasonable argument could be “If a site provides an API for fetching user-generated content, it should be free”. (But just as a moral argument, not something to be enforced with regulations.) The issue is that if they provide a paid API for users’ content, they acknowledge the value in the content provided by their users (whom they already have to thank for all their ad revenue), so it seems unjust for them to suddenly start selling that content, especially without sharing the API revenue with the content creators or something.


> I often get the feeling that in these circumstances people derive first principles from specific desired outcomes.

But that's exactly how laws are made. E.g. why do we have consumer protection? Because it helps us! It's democracy in action.

It's not like nobody would ever build another forum if there were laws like the ones proposed by GP.

And if you're worried that every little forum should comply to complicated laws: we could make it so that the laws only apply to large players, e.g. >1M users.


> laws only apply to large players, e.g. >1M users.

What if I don't want to share my user list with the government?


You don't have to. You can self report and if you incorrectly report it is a crime. As simple as that. This is also exactly how the definition of very large online platforms (which is a site from 45+million montly active Users) in the EUs new Digital Services Act works [1]. Some companies (primarily porn) are believed to have misreported, and are being investigated for that[2]. If they are found guilty they will have to adhere to the regulation for vlops, and pay a fine of up to 6% of their annual EU revenue... (Obviously not the full sum for a first time offence) [1]https://en.wikipedia.org/wiki/Very_Large_Online_Platform [2]https://www.politico.eu/article/online-porn-websites-europe-...


Couldn't the government claim you are lying about how many users you have, and then demand you show them how many you have? How will they prove how many users they have without handing over the list of users?


Theoretically, if your forum is like Reddit they can just crawl the site and look for usernames that appear on top of posts.

But I'm sure there are other ways.


On top of that, your shareholders will want to know the actual number of active users.


Then don’t run a content site?


What if I don't want the government to know I'm a user of Yahoo Answers?


Don’t connect to any ISP?


>It's not like nobody would ever build another forum if there were laws like the ones proposed by GP.

Won't they just build the forums in other jurisdictions then? How many European forums do you frequent? I barely visit any.


I’ve seen a lot of arguments that users are the folks who provide the value/ content of Reddit and so somehow Reddit must respond to their wishes.

And I like that idea, a great deal.

Although I have to note that users have given Reddit that content to use, for FREE, and continue to do so. They do this elsewhere too, including sites where they say don’t like the site administration, Twitter, etc…

So I’m not really sure how this plays out.


What I think he meant by "specific desired outcomes" is arbitrary rules that cater to special interest groups rather than universal moral principles. And yes, that _is_ democracy in action, where the only rule is that the majority wins, at the expense of the minority who just lost. Your ">1M users" rule is an example of such arbitrary rule that proves his point.


> They then attempt to invent moral axioms that they believe will lead to that outcome. They also seem to reinvent history such that there was some "implicit" agreement between Reddit and the users of Reddit.

I notice this also when people are talking about "the spirit of open source." No, open source licenses are licenses, not spirits, and you were contributing huge amounts of time and effort to fill a platform with content that you have absolutely no control over. People don't owe it to your OSS project not to fork it or not to become more successful than you at distributing it as a service, and Reddit can do whatever they want with your content that doesn't violate any obligations to you set out in their ToS.

Next time, do AGPL or proprietary, and don't give away any content that you value to someone else's platform for free. If you want them to be obligated to distribute it for free forever, you really should be paying them.


I think a lot of the people who contribute to reddit and other sites -- whose content collectively provides a great deal of value to those sites -- want fair access to their content.

I think that's a good first principle.

I also think "APIs for content sites must be free" is a pretty good attempt to derive a rule from the principle of fairness. So hardly an invented moral axiom, as you put it.

Also, claiming a reinvention of history is unfair, since, until now, everyone has been contributing to reddit in a context where API access was free. That's the history. It's not the history that's changed, it's Reddits API access policy.

Anyway, it's 100% fine and good that people decide under what circumstances they ought to be willing to invest their time and effort in a social media site. Setting some common principles, like this one, is just a good way to communicate that to the purveyors of social media sites.


> if I were to run a web forum then I must provide an API

Realistically you're using Open source forum software like discourse that itself could provide an API interface at an amortized cost.

Expecting individuals to code it themselves (and follow standards etc) is a big ask. I don't think its a hard sticking point and the majority of the encouragement is for large providers.


I'm honestly shocked by how naive people are about open data/web.

If a platform owner together with a community of enthusiasts produces a valuable set of content/data and then makes it openly available to the world...what exactly do you expect will happen?

We've seen what happens during the last 15 years, an era of massive centralization. And now it's AI saying "thanks for everything!".

But it doesn't even need to be AI. Allow me to use the free and unlimited Reddit API which apparently allows you to make billions of calls at no charge, and then rehost all that content on an ad farm.

This is the real reason the original web3 (semantic web) was dead before it even started and instead of becoming more open, everyone became less open. Because giving away your main assets is suicidal.


but you can access every single shitpost from the website, so why not from API? it's lighter


Most of the time you dont just expose your internal api routes to the world. You need to write curated public facing routes that dont include certain schema or records. Takes time to write and maintain that different set of endpoints


Not really. Your HTML is, quite literally, a degenerate form of an API. The simplest way to offer a content API is to offer an alternative endpoint that serves the same stuff as your normal one, except without all the bullshit (er, beautifully design, interactive frontend).


Any issue with Reddit offering a API only to not maintain it, or uphold any sort of SLA uptime, or not worry about releasing breaking changes every week?

Having a public facing API is not trivial or cost free.


> Reddit offering a API only to not maintain it

If they were to offer an API that's just HTML of the website (old.reddit.com specifically, not the new one) but without the cruft that makes for 90% of the markup of a human-facing page, and which exists only to hang styles and scripts off... why wouldn't they maintain it? It's literally the same as what the browser gets, but without the bullshit.

> or uphold any sort of SLA uptime

Do they uphold any sort of SLA uptime for the webpage itself?

The simplest API would be just the meat of the website, so it couldn't possibly be less reliable than the site itself.

> not worry about releasing breaking changes every week?

Reddit is a stable site. Like most social media platforms, they don't release breaking changes often (they do screw with DOM element ids and CSS classes all the time, but that is to make life harder for ad blockers, which is another topic). Sure, some things may move around, be added or removed - but this is webshit we're talking about. You can't truly rely on any API to have a stable, or well-defined structure[0] - so people are already used to treating schemas as open-ended[1] and keeping up with their changes.

> Having a public facing API is not trivial or cost free.

Sure. But I'm trying to establish a lower bound here, and it's clear that this is much lower costs and effort than maintaining the human-facing website itself. And I mean, remember the whole "semantic HTML" and "microformats" trends of yore? Or how HTML5 came to be, with all those tags like <em> and <section> and <article>? The whole point of that was to make HTML work as both rendering markup and machine-readable API.

Consider also that the alternative isn't no public API - it's scraping. So if your public API is somehow more expensive to serve or maintain than either the website itself, or a decluttered version of it, then you're doing something wrong.

--

[0] - Don't get me started on the disaster that is Swagger/OpenAPI.

[1] - Something Clojure coding philosophy makes explicit: you pass around maps and arrays, you read and write the keys you know about, and stuff you don't recognize you ignore and pass without changing.


Don't mind the other commentor, spirited discussion is always welcome! I was thinking about this and even hacker news is a good example of how an html view can differ from data model. Hacker news doesn't show you the vote count of every comment despite having that data available. They chose to not even render it into the template so no scraper will ever have access to that information.


> Your HTML is, quite literally, a degenerate form of an API.

i look forward to the folks making this argument telling their boss “it’s ok the HTML is an API that’s all we need to provide”

come on, nobody believes this when they’re not trying to win an argument on the internet


Look, if you have a webpage, you're already providing it. If your webpage is of any use to anybody, someone is likely consuming it by scrapping. So if your dedicated API is harder and/or more expensive to provide and maintain than the degenerate API of your webpage's HTML, you're doing something wrong.

I'm not trying to win an argument. I'm trying to point to an obvious reference point for cost/effort behind an API that's handling the same data and interactions the webpage does.


These are already written and fully functional, and Reddit has essentially been a non-moving target in terms of features since ~2016, when they added first-party image hosting.


Reddit actually has added a couple of features since then (e.g. polls), but they just didn't update the API to deal with those, so the API still remained completely stable.


Because you can't include ads in an API response


You could, but with EU and other legislations you'd have to make it clear it's an ad, so in API-speak, there'd be a "is_sponsored=true".

That makes it too easy for API consumers to ignore. (Compared to human eyeballs browsing a webpage)

But you could make it part of the ToS: if you're using the API to present posts to users, you MUST show all the posts in the list, including the is_sponsored=true ones.


That's a good point, UK Government page on "Guidance: Hidden ads: Being clear with your audience":

https://www.gov.uk/government/publications/social-media-endo...


Even though reddit probably don't tag the ads on their website deliberately, it's still really easy to block them in a browser


Even so, how would you track impressions?


Why can’t you?

“Sent from my iPhone.”


Any time I see "Sent from my iPhone" I assume the sender is a fool


Are you missing the point that adverts can be inline with normal content and therefore can be included in an API result?


Yet.


Because others use automatic ways to retrieve the data and derive value from it without paying anything


If reddit did that from the start we wouldn’t be in this situation in the first place?


Some people perceive Reddit as a platform build with their tax money. The argument usually goes that Reddit is build with the free labor of participants, but I think they forget that they also receive an experience in return.


It would be pretty easy for someone to clone your app and just take out the ads.

If you're required to give access to all data with an API for free, it seems it would be hard to run any of these websites without having a massive loss.


> I often get the feeling that in these circumstances people derive first principles from specific desired outcomes.

That's a pretty normal way derive principles, isn't it? You start with an observation, and then you recursively ask "why?" It's possible that your answers are wrong, or even that your observation itself is flawed, but the concept of deriving principles from outcomes isn't really flawed. Where else are they going to come from? Should people just pick principles at random and hope they're not terrible?


People don't seem very good or interested in assessing how their newly derived principle will apply to situations other than the one they want it for.


I completely agree.

The only real reason to be upset about the Reddit API changes is that Reddit’s official clients suck. The formatting sucks, they stuff them way too full of ads. Apparently they are less accessible for blind users, and inefficient for moderation.

It’s weird to be upset about people like the Apollo dev (who could just require payment) or scrapers. They don’t really have any right to API access, and should have to pay for it.


HTTP GET is an API. All you have to do is not block well-behaved scrapers.


> I often get the feeling that in these circumstances people derive first principles from specific desired outcomes.

Once you learn about motivated reasoning, you start to see it everywhere.


Ironically you only start to see it where you want to see it. Motivated reasoning you agree with is just reasoning


Yes. Otherwise you are not entitled to use them to make money. It's not your content.


If you operate in GDPR land there is a requirement to provide all that data (by the user) by law already. It's not unheard of.


Now that is something I agree with (the principle not the particular law).

If we're opening it up to our own opinions - I believe I should have total control over all of the content I generate in all contexts. That would include removing the content I provably generated from any platform at any time. Note that HN doesn't even allow me to do that ... I can't delete posts after some time frame.

But my understanding of GDPR is it doesn't go that far. It applies to PII (Personally Identifying Information). That would include things like mailing address, email addresses, full names, phone number, government ids, etc. I may be wrong, but I do not believe it would apply to the content of shitposts on Reddit.

Where I disagree with the articles vehement moral stance ... is my right to the publicly generated content of other people.

Also, if we want to get all technical, and I'm surprised no one has, there is some analogy to be made between the philosophy of free access to information provided by public libraries and this debate ...


The most accurate/charitable interpretation of the content, despite the abysmal title, would be:

If users provide content for free, then a platform must allow users to consume that content freely, as in "as they wish", not "for free".

The author supposes that this is a fundamental social contract between users and platforms that needs to exist for platforms to work well. The author does not cite any evidence or examples or really even provide any explanation as to why this must be the case, they simply state it as a fact. Personally I find that, while thought provoking, this thesis is not obvious and needs to be defended. And so I don't find much utility in this short essay other than to state the thesis itself.


> If users provide content for free, then a platform must allow users to consume that content freely, as in "as they wish", not "for free".

The platform is paying me with distribution and letting me take advantage of network effect. Anyone can setup a free blog and post all they want, and make everything available via RSS, but instead people choose to post on Reddit (and on HN!) because Reddit has built a platform that provides additional value vs posting on one's own blog.

Platforms need to make $ to stay alive, Reddit's problem is they took VC money for a product that, it turns out, doesn't generate VC returns.

Without the need to pay off VCs, Reddit could likely easily become profitable (as they reportedly were in 2019) and everyone involved could become comfortably-but-not-filthy-rich.


The implication here is that, "...or else users won't participate."

Which I agree, needs to be justified. You see this kind of talk a lot when power imbalances exist; the person with no power tries to speak on behalf of everyone involved who they think believes as they do, in order to try and stand against the existing power structure.

Sometimes it's highly effective (actually appointed or even de-facto leaders of movements/groups), other times it's kind of laughable and presumptuous.

Without justification, this is the latter. That said, I bet there probably is justification, if the author bothered to look for it or provide it!

I would suspect it's true that users will not make use of a site like Reddit nearly as much if the content isn't then made available for use "as they wish".


> as in "as they wish", not "for free".

What's the distinction between the two? If the site is monetized via ads then providing the content ad-free (via an API or otherwise) automatically makes it "for free".


The user does not provide content or participate on the platform for free. They get a free account in exchange for doing so, along with allowing their own data to be datamined.


It's a demand masquerading as an argument.


The title of this post is kind of clickbait-y, the author immediately puts an asterisk in the first sentence and notes that they think charging based on the cost of running the API is justified. The title should really be "APIs for content sites should cost a reasonable amount".


Yes actually I feel the last paragraph completely contradicts the title and tone of the post.


I would frame it as "messaging applications should interoperate".

People understand that a T-Mobile customer can send a text to an AT&T customer. They understand that a Samsung owner can call an iPhone owner.

Why do they accept that a Slack user can't send a message to a Discord user? Why do they accept that Google has made 11+ different messaging applications that aren't compatible with each other? Why do they accept that you can't call a Skype for business customer with Skype?

We've had 30 years of stagnation because messaging applications don't interoperate. That's why the story is always "Hey, remember back when ICQ used to work? You should try Skype, it really works..." Once Skype becomes successful at the two-sided market game the honeymoon is over and the vendor has no incentive to keep it working and they figure you'll keep trying to use it anyway.

If there were competition for both the server and the client there would be continuous pressure to keep the products working, and to improve them. So instead of Facebook Messenger being the same as AOL Instant Messenger except for Facebook, we'd have had 30 years of progress and would have apps that look like something out of The Jetsons.


There was that short period of time where most of the big players (Google, Facebook, etc.) had settled on XMPP and it was all interoperable, to some degree at least.

I even remember when Facebook first announced their support thinking "Oh sweet, now I can message all my Facebook friends from my current chat app." And when it launched, I could! But then I realized I didn't really want to.

And it wasn't long before support was dropped. I'm not sure anyone noticed.


Funny but XMPP has hit it really big for law enforcement and military users. For instance the commander of a squad fills out a form about what engagements they had with the enemy every day and often they fill out a form that gets submitted to XMPP and is sent asynchronously up the chain of command.


I was surprised to see this mentioned.

I was working for a company over a decade ago that saw the military purchase a piece of software from us specifically for its messaging and XMPP based capabilities. We were requested to rewrite chunks of our application just to meet their needs. Makes me wonder if it is still in use somewhere.


While G and FB both used XMPP, I don't think they ever federated


> People understand that a T-Mobile customer can send a text to an AT&T customer. They understand that a Samsung owner can call an iPhone owner.

Meanwhile, every European techie under the sun is still breathlessly shouting "WHATSAPP! WHATSAPP!" in my ear. "Why would any unevolved caveman still use SMS?" etc.

Well, maybe because Meta sucks and I think protocols are better. I think SMS/RCS/etc. based texting will outlast WhatsApp, and anyone in 10-20 years is free to look up my email here and let me know if I'm right.


> People understand that a T-Mobile customer can send a text to an AT&T customer. They understand that a Samsung owner can call an iPhone owner.

That was a hard-won victory: https://en.m.wikipedia.org/wiki/Kingsbury_Commitment

A hundred years later, we've got the same story, except today's Big Tech had wised up.


The EU can do better than spamming us with cookie popups. They'd like to be relevant, why don't they try passing legislation for this?


They do try: https://news.ycombinator.com/item?id=36324118

But keep in mind that EU legislature (and "government", such as it is) is as dysfunctional as the US one, just in different ways, don't expect much enlightened lawmaking there either.


The use of the word "must" (in both the title and the body) is uninformative. What on Earth does it mean? Does it mean there's a law? A moral imperative? A strategic necessity? A technical requirement? Is the author claiming some form of jurisdiction?

"It must provide APIs" isn't an argument. It can't be taken literally because it's clearly false. And it can't be taken in any other form because the author doesn't tell us what those forms might be.

It continues, equally poorly stated: "Millions of users create the content expecting it will be widely available. Locking down an API breaks that social contract." Now that IS an argument, but it's not a good one for a surprisingly large number of reasons, the simplest of which is that websites like Reddit are widely available regardless of a free API.

Finally, we get to the only actual statement of merit: "The short sighted thing about these API fees is they will harm the company in the long term." Yes, that's quite likely true. Is that the whole argument? That this is a short sighted plan? Companies are free to make stupid plans. They do it all the time, and you are free to stop doing business with them. "Must" doesn't enter into it.


I agree. The way the article is written is basically just a guy throwing a tantrum because he wants something. That doesn't make it a law of the universe that he must be given what he wants.


I fail to follow your chain of reasoning. How can the claim both be "clearly false" and unclear in its statement?

The use of "should" as the very first line of the body text actually makes the argument extremely clear, which is that there is a non-legal, non-technical imperative for API access [leaving only a moral/social imperative, as you determine]. I think you are probably smarter than the very low reading comprehension bar in order to understand what is being said, so I'm not sure what value you are deriving from pretending not to.


> there is a non-legal, non-technical imperative for API access [leaving only a moral/social imperative, as you determine].

There is no more an argument here than if I said there was a moral/social imperative for them to clean my house. An argument derives conclusions from premises.


"X must do Y," taken literally, would imply that X did Y. But here, X did not do Y. That's the "clearly false" part. and that's fine, because that's how language works. People speak hyperbolically or metaphorically all the time.

But the author must be trying to say something when they say "X must do Y," and it's not clear to me what. Are they just using "must" to mean "should?" Or do they mean something stronger? And what does "should" even mean? Do they mean should like "it would be a tactical mistake not to?" Evidently not because they start talking about social contracts and such, which are more about some sort of community unstated rules or traditions or something.

Put another way, if I say "X must do Y" to you, and you want to disagree, how would you do that? Would "X is not required to do Y" be a rebuttal? Or "Y would not be very nice?" I can't tell if either assertion would contradict the author's assertions. What are they trying to say?

Maybe it's just me being annoyingly pedantic (it's happened before!), but it's the title of the post and even after reading the whole thing I'm no closer to being able to summarize their argument. Is it just "I don't think Reddit should do this because it's a useful tool and it'd be mean," but with the word "must" thrown in randomly for oomph?


""the social contract""

There is no such thing.

Faceboook, Twitter, Reddit, Instagram whatever exist to make money, or whatever those who own it or run it wants.

It is your choice. You wish to spend your time adding value to a website someone else runs great. Does that give you rights? Nope.

Normally in extremely obfuscated ways, the contract you do enter into by using a site will tell you exactly that you have no rights at all and they own everything you do.

I agree that it can be unfair. But you actively decide to provide content. You cant make up the rules for the sites you do use. You can stop whenever you want. You can start your own site.

I did so a few years ago when I got sick of this shit. On the other hand, I have few visitors. I can howl in the wilderness. For me it is preferable.

If you do engage in one of the big social sites, you are given access to an enormous audience. You can post something that "goes viral". You get fame, site gets $$$$, maybe you get $$$. News can easily quote you from some of the big social sites.

How often do you: Thinkbeat today said on his blog thinkbeat.blog that ""

Compared to Such and Someone posted on Twitter...


"If a company like Reddit or Twitter derives most of its value from content that users write for free then it must provide APIs for anyone to download and manipulate that content."

I disagree that Reddit or Twitter derive value from content. They derive value from eyeballs and showing ads to those eyeballs. If people use APIs as a way to feed their eyeballs without looking at the ads, then Reddit or Twitter gain nothing in exchange for their massive expense of the hardware and staff required to run these services.

The other option is that people would pay to use Reddit or Twitter. But they would lose 90% of their users if they took that route, because what they do does not provide enough value that a typical user would pay for it. For most people, Reddit and Twitter are time sinks: mindless entertainment to consume to pass the time. In a prior generation they would be watching TV game shows or soap operas. They wouldn't pay $10 a month to use it, or even $10/year. They would just jump to the next best free platform and continue there.


> The other option is that people would pay to use Reddit or Twitter. But they would lose 90% of their users if they took that route, because what they do does not provide enough value that a typical user would pay for it.

I see this repeated all the time but I've never seen any evidence cited to support it.

Also, 90% of the users leaving when the site switches to a paid model doesn't necessarily mean the quality of the site or their profitability will go down. I'd pay a pretty hefty monthly fee for most of the social media sites that I've now stopped using as a result of their heavy handed strategies to maximize ad revenue.

Also, since when did we start deciding how much value a thing could potentially provide to an individual? Isn't that what "the free market" is supposed to sort out?


It's stunning to me how little people like the author of this piece know about the economics of running large scale sites like reddit or twitter.

Nothing is free, including providing and maintaining APIs.

Someone's gotta pay for the ramen, especially if you can't run ads.


If you read the article and not just the headline, the author is in favor of charging a cost based fee for API access. A fee that would cover hosting and serving the API, but not generate profit for the parent company benefiting from the 3P users' content.


There would be no benefit to providing it, then.


No first-order benefit, perhaps.

It shouldn't be difficult to look to second- and third-order benefits, though, such as increasing your user base and the amount of generated content.


That makes sense.


I have a theory that governments and corporations across the world are pushing hard for free APIs. If they cost anything, even very little, then it makes astroturfing too expensive.


Read the footnote.


I think the entirety of their argument is:

> social media sites don’t produce content. They merely host it. Millions of users create the content expecting it will be widely available.

?

But this feels pretty simplistic. Not convinced that people do (or should) upload content expecting they or anyone else can access it however they want.

If you create the content then it's obviously yours forever. If you also decide post it to a platform, why is it the platform's social responsibility to allow access to that content for free.


This cuts both ways -- if I'm not going to allow access to that content for free, why would you give me your content?

The ToS might allow for it, but people are routinely dismayed when companies stick to the letter of what their ToS allows them to do, especially when they start doing things they never did before.


> if I'm not going to allow access to that content for free, why would you give me your content?

A bunch of reasons potentially, wider reach probably being the main one, but maybe also stuff like ways for your audience to interact with you/the content/each other.

> The ToS might allow for it

"It's in the ToS so it's fine" is a bad argument, but I'm not saying that. I guess there's two things:

a) empirically, do users actually expect that content be available "for free" (in general? just to themselves?) from a platform

b) is there a "good reason" that this should be a social expectation?

a)'s a tricky question with multiple parts but I'm not especially convinced. I imagine a lot of users aren't even really thinking about that kind of thing

b) I could imagine being convinced about, but "they didn't make the content" doesn't feel sufficient.


Such arguments are always just different forms of "everything on the internet must be free, always", which is a valid ideological position to hold, sure, but one should also realize that it is not going to work in the real world.


Good luck lobbying YouTube to make their API free. I'd love to have a version of the YouTube app that didn't cost me any money and also didn't have ads.


I use ReVanced and it does exactly what you ask for, including even more features like SponsorBlock. I can't imagine using the regular YouTube app anymore.


Funnily enough there is also Reddit ReVanced, but with enough effort Reddit or YouTube could stop them working, so their existence doesn't change the argument about free/open APIs.


Author is not even arguing that, though I agree with you. Horrible title.


> I'd love to have a version of the YouTube app that didn't cost me any money and also didn't have ads.

If you don't want to pay with money or your attention how are they going to pay for all that traffic end engineering?


I think OP was using that ironically as an example of how ridiculous the expectations of the free-API folks are.


If you're using an open source phone OS (not iOS) Newpipe does exactly that.


I just consider anything I post on Internet like this comment to be donation to who ever hosts it... They can do anything they want with it, and take all the responsibility of things like hosting or safe keeping...


Why not a publicly owned platform instead of forcing private businesses to operate for the good of the general public?

Why not stop building public pools and just force people to make their private pools available for anyone to swim?

Is it because we have seemingly lost faith in the very notion of a republic?


Why not just make API accessible to paid "Reddit Gold" members?!

Then there'd actually be a reason to get gilded up, and 3rd-Party Apps would still work for those paying members.


>Then there'd actually be a reason to get gilded up ...

I have to imagine reddit has already thought of this and come up with some justification for why it won't work - because the proposed strategy is just so incredibly lazy. Ultimately I suspect the data somehow suggests (a) the network provider has to provide free access in order to sustain the criticality of the social network, and (b) charging for something that other people are getting for free (a la twitter's 8$ checkmarks, reddit gold, etc.) is just not a very viable product strategy.

So Reddit has decided to stick 3rd party devs with the thorny problem of direct-monetization of users, wherein their cost to the developer is usage-billed while the revenue to the developer is presumably fixed; meanwhile Reddit can cash the checks from their mobile ads.


Content sites must be free to build their products as they see fit. No asterisks required!

Confusing Free as in libre and Free as in gratis causes endless turmoil.


The trouble with a free API is that someone can write a $1 client that hides all the ads (the number one requested feature), and now your website has no income.

An ad-supported website has to get paid. On the web there is at least some stuff they can do to detect ad blockers, or show ads in sneaky ways that change over time and are hard to block.


That's a good argument against ads, because ads introduce a fundamental misalignment of incentives between the platform and the user. There are many ways reddit could build a collaborative business from their current situation. Advertising is very clearly not going to work, so they just need to look outside that box.


> There are many ways reddit could build a collaborative business from their current situation

Like charging for API access? Yeah, that went well...


The Spotify solution (premium users can use any third-party client) would have made lots of money with less uproar. Another bonus, solves the supposed issues with allowing NSFW content through APIs. Google, for one, take a credit card as proof of being 18+ in the EU (you have a choice between that and scanning an ID card).

Or if Reddit doesn't want to handle billing the users directly for whatever reason, require signin and charge the devs per user. Even some sort of grace period and reasonable prices per-request would have made a big difference, Apollo dev's hard blocker seemed to be conflict with existing billing cycles.


The context is a for-profit company. If a third-party using the API is doing so as a substitute for someone using the ad-supported site, the API will be priced to equal the revenue lost from using the site. If the people switching from UI to API are more valuable than usual, the API will cost more than usual.

If the API is being used to support something new -- i.e., not a replacement for a user on the site, but as a new revenue source -- it will be priced to maximize overall API revenue, or overall revenue as an indirect result of growth supported by the API. (Maybe it's one massive customer using it or 100,000 freemium developers.)

The idea that a company "must" do something doesn't make any sense. And the assertion that the company is harming itself is ultimately something the Board, acting on behalf of shareholders, is responsible for.

(FWIW, I agree Reddit is shooting itself in the foot.)


Won't machine learning largely solve this problem? Fine, don't provide an API, but I can extract a useful JSON document from your HTML representation.


If you're referring to reddit you can just add .json to the URL. eg: https://www.reddit.com/r/blog.json or https://www.reddit.com/r/teslamotors/comments/149ad64/teslas...


As far as I'm aware, Reddit still allows you to append .json to any of their pages and you get the results as a nicely formatted json document.

No LLM required.


The question is: will that still be available after the API is paywalled?


I imagine there's ways to curtail that (like detecting non-human users).


In my opinion, the author doesn't really think things through, it's just dropping some confident one-dimensional opinions without looking at multiple perspectives.

"The key thing here is social media sites don’t produce content. They merely host it. Millions of users create the content expecting it will be widely available. Locking down an API breaks that social contract."

"Merely host it" is framed here like it's no big deal. It's a huge fucking deal. Giant social networks are incredibly complicated to run, and it's not only technical aspects that make it so. You can hate these companies all you want, but let's not be ridiculous in suggesting that they offer little to no value.

Second, the idea that content producers produce all the real value on the network is incomplete. First because of the reason mentioned above: the network itself (the plumbing and operation) has tremendous value and there would be no content without it. Second, because freely provided content in itself offers no financial value, in fact it starts out as a cost (hosting and moderation). Only when monetized does this content provide financial value. I know that users don't give a shit about that or are hostile to monetization but that doesn't change the fact that a large social network needs to cover costs.

Third, the idea that any user of a social network has the right to download/access the entirety of all content ever produced on it, and this being some right or social contract...is made up bullshit. Most people have no idea what an API even is, have no weird need to do data digging, it's not a common need or expectation at all.

Fourth, the AI problem is conveniently dodged. At Stackoverflow, moderators angrily demand their data dumps back, completely ignoring the reason why they were shut down in the first place: AI. Yet nobody engages with the point that the entirety of Stackoverflow might cease to exist and become obsolete because of it. That includes all the content those moderators put so much work in to curate.

In both cases, you're on a sinking ship and protest to actively sink it faster.


All of your labor should be free also.

What do you mean you want a six figure salary? How incredibly greedy of you.

It's fair to pay you only what you need to survive.


> All of your labor should be free also.

Yes, it is, and that's ultimately the contention here. Reddit has taken many millions of man hours of free labor. Now it wants to charge for the fruits of that free labor.

Legally, they are within their rights to do so. Socially, however, it's thought to be uncouth to exploit workers like that.


> Reddit has taken many millions of man hours of free labor.

Reddit has accepted many millions of man-hours of labor. They didn't kick down anybody's door and take it.


> Reddit has accepted many millions of man-hours of labor.

Yes, "accept or receive (someone or something)" is indeed the literal dictionary definition of "take". Thanks for subbing in an alternate word that means the exact same thing. I am sure we have all derived great value from your efforts. Or at least a hearty laugh at the pointlessness of it.


Shitposting on reddit isn't "labour".


[ LAUGHS IN LEXISNEXIS ]

Everyone seems to be fixated on the idea that the API should be free and no-one seems to be mentioning that there's some very successful business models built around charging people access to content which the parent company didn't generate. LexisNexis, for legal searches, Elsevier for scientific papers, Facebook, for random graph searches (and occasional political data scraping), Twitter, etc.

It feels scummy to me too to take something free and charge people for it, but it's also not this unique thing people seem to be screaming about. Just go somewhere else if the API is that important to you. (Like hackernews). :)


LexisNexis does not just "charge people access to content which the parent company didn't generate". I could access most legislation through the government legislation website that shows the current version of every piece of legislation. I could piece together legislative history by looking through all the various amendments over the years compared to the originally enacted Acts. OR, I could look at LexisNexis, who have done all that work already and also have notes for pretty much every provision linking all that data together with academic and judicial commentary, major cases on those provisions, etc.

Similarly, I can look up unreported senior courts cases on the 'Judicial Decisions Online' section of the courts of New Zealand website, for free. But what about the notes written by experienced barristers in the law reports? What about the database cataloguing every judgment that has referred to every other judgment, giving an indication of whether the judgment is still good law?

It seems very popular these days to crap on businesses like LexisNexis and it does sometimes feel like they're taking the mickey given their prices and the terrible web interface to their databases. The fact you can't even 'open in new tab' properly is infuriating. But they're not just selling access to something that ought to be free, as many people seem to like to claim. They provide a lot more than just the raw judgments and statutes.


+1 I tweeted exactly this same sentiment just yesterday:

> This Reddit / Stack Overflow API drama feels weird to me as a user. It's USER submitted data. R/SO didn't create it.

> So what ChatGPT scraped it? They're benefiting the same users who contributed the content on R/SO, and then some.

> It's not like ChatGPT is cloning them.

> It would be like if Printing Houses tried to block authors from publishing their content as an eBook or Audiobook.

> Users contributing to your ecosystem doesn't give you perpetual dominion over that content


If you don't provide an API for your content, but you're happy to render some HTML containing the content when I browse to your website, then if I want the content, I will get the content. No API is necessary. I will happily waste your resources scraping HTML if it's necessary to get the content I want, so it's in your best interest to provide it via an API, which is ultimately just a more efficient interface to the same data you're already providing me for free.


In the era before JSON and pervasive Javascript, HTML was the API. The Web is an API. The people saying "API" are really just saying "the same data that's in the HTML, but stripped of extra junk". HTML was never supposed to be about presentation, it was data meant for a variety of user agents. Has everyone forgotten? Are they too young to remember? I feel like I'm taking crazy pills!


Is there a reason web scrapers aren't used for public sites with content? It seems like this would avoid the pitfalls of APIs and changing terms and prices?


They were, in a much less constrained way, until companies running platforms started suing people for scraping. And investing heavily in making it difficult technically. The difference is advertising. The vast majority of people do not want to be advertised to, given the choice, nor do they want to have their contract with a platform downgraded (by having to use the UI designed by the company). This situation is some of the users taking a stand about it.


They'll get blocked and/or sued. And break every time the site changes.


My take is slightly more nuanced - if you're paying a company to access their content, it shouldn't matter how you're accessing that content.

More concretely, as a (former) subscriber to Reddit Premium, I should continue to get access to Reddit via their APIs for free. But that was never an option, since Reddit only wants to charge the application creators and not the actual consumers.


This is a completely flawed argument. You didn't buy Reddit premium to get access to the content, you bought Reddit premium to not see ads and get free coins, which you did get.

Your argument is the same as saying that if I buy a can of Coke I should be allowed to use the CEO's company car for life.


Not seeing ads is always free, just don't use the site, Reddit premium=content-without-ads.

It's easy for them to allow premium users API access, they'll still be developed for third-party apps with an accessibility focus amongst other things.

Reddit doesn't owe us anything. But given how easy this would be, it's unsurprising that taking the current obtuse approach makes some people not want to use the site.


A different version of this argument is: why should it matter if I'm querying the content using the API directly versus by using the HTML web interface? If it's the case that different consumers of the API might put different load on the system than the web page that the provider can't/doesn't want to support, why isn't rate limiting the solution?

This also provides providers a way to fairly monetize their API: we're giving you some basic level of access that matches what you get through the website for free, but if your access pattern is such that you put a load on our system that we can't tolerate, then we'll charge you (by providing a paid tier).


The HTML interface has ads baked in, which pay for the request. API does not.


This post makes no argument. It is just a pile of assertions. Simply stating that there is a "social contract" doesn't make it so. What evidence does the author have that "millions of users create the content expecting it will be widely available"? Even if that's true, what evidence is there that they do so with the expectation that it will be widely available through an API or through third party apps? And even if they do expect that, by what logic can you conclude that there is now a social contract that that expectation will be fulfilled?

I don't think any of that is true. I don't think people are thinking about third party app users or about the wide availability of content when they post things to Reddit or other social media websites at all. I don't think they're thinking about people using third party apps. I think they're thinking on a much more mundane level.

The whole concept of a "social contract" is fraught with difficulty too. It's a relatively niche idea to try to explain why people follow laws, or how laws can be justified. The idea is that we obey the law for the same reason we obey a contract: exchange of consideration for mutual benefit. But that's not actually a very good explanation, for a long list of reasons, and it certainly can't justify laws of universal application given that eg. children can't be bound to contracts they've signed and we apply laws to everyone even if they don't want us to. The concept certainly doesn't make sense in the context of a web API for a social media website. Even if everyone "expects" that their "content" will be widely available, it does not follow that there is any obligation, legal or otherwise, on Reddit to make it widely available. There just is no logical connection there unless you build one. This post does not. It simply asserts that APIs for content sites must be free. Why? Because!


Does not make sense to me. Providing Apis costs money.others can make money out of it without paying anything.


It’s an interesting argument, but I don’t think users provide their content for altruistic purposes. Some do it to benefit from a sense of participating in community, some to build reputation and some to prove their expertise. In all cases, they get something from the site that’s hosting their content.

So why should the site that’s hosting content not be allowed to have a business model that pays the bills?

Given a choice between the following, which would you rather?

* insidious adverts that compromise the impartiality of the platform (because thou shalt not offend the advertiser)

* charging for API access, which affects a tiny proportion of users, and would not in any way prevent you accessing your own content (subject access and data portability is guaranteed by data protection law)


Make a completely free API but change data structure every week or so.

App makers will get angry and stop at some point


Isn't that effectively what they are doing? The HTML-based API is apparently going to remain free, but makes no guarantees about structure consistency. If you want an API with certain structure stability guarantees then you have to pay up.


> Their main intent is to destroy third party apps that no longer aid the company’s business goals. But they’re also trying to make a few million bucks a year licensing access to data, particularly on the back of AI training. It’s wrong.

This seems backwards to me. It's obviously a move to try (and likely fail, assuming screen scraping works well enough) getting in on the AI money train by charging for Reddit's data being used to train LLMs. I don't think Appolo and other 3P clients are the target of this API change, even if the results of it are anti-competitive and should be stopped regardless of intent.


Personally I always considered the social contract to be that the price of admission for a free social network / online community is that you need to put up with ads and attempts to monetize the data, provided the latter isn’t an unreasonable breach of privacy. A free API never crossed my mind.

It’s why I have to admit that I am not sympathetic to third-party apps like Apollo. They implicitly violated the social contract I recognize by providing users with an experience that circumvents Reddit’s attempts to make money. Reddit responded by finding a way to put a price on the ad-free experience.


Meh the argument is quick and sloppy and caveated. Clickbait title. Move along.


I haven't been following this much, having quit Reddit many years ago, but I bet that it's a step on the path towards containerized headless browsers being used to provide unauthorized API's.


Why do APIs for content sites must be free? Are you paying the cost of upkeep for that social media site? You said no ? Then you should shut up. Every social media site is a for-profit business. You, the user, can not dictate how they run their businesses. If you don't like the way things are going, feel free to pack your stuff up (proverbially speaking) and find yourself a new place. As simple as that.


This post just asserts the existence of a right that comes out of nowhere. The rights you have on a website are established by agreement between you and the site's creator. You are free to not post on the site if you disagree with the terms of that agreement. If you choose to post, you have apparently come to the conclusion that the exchange is mutually beneficial.

If you think that APIs for content sites must be free, I think you need to offer more to justify it then this does.


clickbait aside, this seems like a myopic take on social media IMHO.

i would like to rather contrast it with the notions of web3 and people wanting ownership of their content, or at least be able to use the platforms without ads or data harvesting.

while the real argument in this post seems to be that the api should be priced "fairly", we need to recognize the limited revenue streams the site has been employing till now, including the meaningless awards.

the underlying costs might be apple and oranges, but the ads till now have not been super-intrusive à la youtube with its upto 5-6 continuous ads, or more relatably 5 banner ads or autoplay videos like many publications.

moreover, given the technically inclined audience here, we can all agree that they have not been fighting back adblockers on their web ui like twitch or news websites.

i have no websites with major traffic or ever got blessed with the "HN effect" on my web infrastructure, but those who have can attest to the unaccounted costs of running a public site for free. so i find it ironic that the most backlash has been from this crowd, who also tend to be happy to pay for chatgpt plus for personal use at a monthly rate costlier* than prime and netflix combined.


Related note in the sidebar: NY State is considering a bill to require social media companies have open API access[0]. I have to wonder if this is something either PIRG or FUTO Tech/Louis Rossman are pushing for - in which case, good on them.

[0] https://www.nysenate.gov/legislation/bills/2023/s6686


I think the most important thing is the copyright of the content. If the content belong to the users, the platform (sites) should not charge for access, or at least they need pay to the users as well. If the content is belong to the platform, they certainly can do anything they want, because when you contribute the content for this platform, you agree with it.


I like how he touts Stack Overflow data dumps as an example, even though they also totally sold out and stopped doing the data dumps. They are no better.


Do you have more info on a policy change at Stack Overflow? The latest dump on their own post is from March 2023. In the past dumps have mostly been monthly but it's not entirely consistent. https://meta.stackexchange.com/questions/224873/all-stack-ex...


SO dumps are a great example, it's common to find regurgitated SO data in google search, ranking above the original content.


It's been kind of interesting to see this play out in real time: it's clearly a board level reaction at these companies to OpenAI essentially scraping these companies, running a model over their content, and then essentially disintermediating them.

And when you ask ChatGPT you're (marginally) less likely to be chided by someone telling you you're an idiot for asking the question in the first place.


>The short sighted thing about these API fees is they will harm the company in the long term.

Black-belt in running a public company right there.


Rather than APIs for everything I'd prefer that we figure out (technically, socially, politically, economically) how to separate data and applications on the "web platform" in the same way that we do on a local OS. Keeping files as distinct from the applications that use them has been a pretty effective design principle.


API for the content I write? No thanks. It takes a few seconds for a scammer to copy the content and apply their *** referral links to my content.

Let's remember that OpenAI made part of the profit thanks to the data they acquired from crawling the websites (not directly by OpenAI, but some other parties).


What’s interesting is that when you charge for api access calls suddenly you have an incentive to make each api call as inefficient as possible. Everyone has always complained about how terrible reddit’s api is, now they have no incentive to improve it as each call is generating profit.


This assumes that the API is the reward provided by the website to "buy" the user-generated content, as if the website did not provide value otherwise.

Nobody posts on Reddit just to access the content via an API, it's not a finality of using Reddit, it's a nice to have.


1. It is impossible to use Reddit without an API (a webpage is just another API).

2. No API is ever truly free. There are always tradeoffs that come with a cost.

3. Not all APIs have the same cost.

The question is, at what point does an API become a cost too much?

There is a lot of indication that changes at Reddit will mean that some APIs that were previously considered to be acceptably priced will no longer be acceptably priced. The cost of using the API will exceed the value derived from using the API.

Other APIs are stated to remain unchanged. In theory, users using the APIs in flux could migrate to these price-stable APIs. But, that does not mean that all people find those APIs acceptably priced, even where they did find the previously mentioned APIs in flux to be acceptably priced previously.

An API is not why someone uses Reddit, but it is a necessary precondition in order to use Reddit. If one finds the cost of a Reddit API to be too high, one will not use that API. If one finds the cost of all Reddit APIs to be too high, they will not use Reddit. To retain users, is necessary for an API to be "free enough".


Given that your very first point is arguing semantics, I think it shows how weak any argument that you can bring forward is, so I won't even read the rest.


1. Defining previously nebulous terms to clearly identify how they will be used as follows is not arguing semantics. To frame it as a semantic argument is erroneous.

2. Comments are always written by the author, for the author. If you do not want to read it, great. It wasn't ever for you.


100 requests per minute will be free. Can someone explain to me why this isn't enough?


100 requests per minute per what?

Like if there's a third party app built against the API, is it 100 rpm for each user of the app, or across the entire app's usage?


I think it is time for some of those apps to invest into caching. Also APIs are not only about third-party applications. I can think of a dozen ideas for products build around the Reddit API without going over this limit.


> I think it is time for some of those apps to invest into caching.

Even ignoring staleness issues, storing a bunch of reddit's content for them is likely going to get expensive fast as well

> Also APIs are not only about third-party applications.

Yeah but aren't third-party apps mostly what people are upset about?


Putting a price on an API that is just there to parse website content is not a great idea to beginning with. It's open-door for scraping companies and uncontrolled traffic. At least it's possible to put a rate-limit on an API.


The article disagrees with the headline. The article says

>*I don’t mind a site charging a nominal fee for API access. Either to cover the cost of API service itself, or more importantly to encourage API developers to be efficient when making API requests.


"Must" [as "be necessary"] and "should" are ultimately meaningless when you're dealing with corporate.

Corporate only understands "allowed", "required", and "forbidden". Handle it accordingly.


>simple static dump is the bare minimum to fulfill the social contract

I don't see how that follows? Would be awesome, but I don't see how posting on a platform creates such an obligation to render such a service (which costs money)


It's what we aim to do at https://serpapi.com, making an API for everything that doesn't have an API.


But if the product is mostly monetized by ads, how can we expect that they let people use the API for free to create an alternative front end that gets rid of the ads? Reddit isn't Wikipedia.


I may agree with the idea, but it is pure fiction. Instead of begging companies to be nice, we should figure out how to change our entire software paradigm so it becomes impossible.


Hey Nelson, you haven't really been blogging on that domain since 2001 have you?

If so, that is pretty amazing, and I'll be adding it to my list of retro sites still in existence.


I'd go a step further: APIs shouldn't exist.

I know, it's a hilarious sentiment in these times. But I really feel that most of the problems today around walled gardens and surveillance capitalism stem from not using the web directly as it was originally envisioned.

If we had properly formatted HTML, little/no Javascript, a standard auth mechanism that isn't terrible (HTTP auth or a cookie like the good 'ol days), automatic sitemaps or semantic representations without manual intervention, best practices around site updates and maintaining previous versions, etc etc, then why would we need an API?

Since that stuff will never happen, I'd vote for third parties to provide open source APIs for existing sites. Companies could even hire those third parties to maintain APIs for them.

Disclaimer: I'm paid to write APIs and I'm just so tired.


Eh, a JSON blob is still much more lightweight than HTML/XML. This is important in less developed areas of the world, where internet speed may not be as fast.


Screen scrapping is still doable/feasible/economical in 2023? If not legally, technically? What about Cloud flare protections, bot blocking and what not?


There's some argument to be made because of Section 230. They are just distributors of user generated content and gatekeeping may go against that.


It doesn't even have to be API... FB/instagram quite often prohibit you from viewing something if you don't have account... :/


But a website is already parsable? When I needed some content from a website without an api I just faked the user agent and parsed it with soup.


These sites also produce a useful ranking which is separate from the content. They deserve to be compensated for that if you want it.


but if internet is only reddit and onlyFans and so one i think we enter the middle age. I miss complexe forums with passionated guys. redeit ends up rants and garbage of newbies, I dont use it so much . Maybe I'll be donwvoted but years ago I did learn and also meet people with forums. Now its redeit full of newbies


Very large websites should be sized by the government as they become a public good, and be subject to control which stems from society at large. The only question is which government should size a website which the entire globe contributed to?

Only in a world like that can a reasonable policy like this be enacted. The second thing that should be enacted: these sized websites may only censor content that is proven in court to be unlawful.


Content sites living off ads revenue with APIs should require API users to display ads. End of story.


>APIs for content sites must be free

so..... whose tax money are we gonna use to pay content sites owners for this?


Some people like the idea of freedom until they encounter the consequences of freedom.


Freedom is also the choice not to use a platform and the choice to criticize it when they do consumer unfriendly things.


Yep. Though they're already going right back to the platform and crying about the ordeal on said platform.


This seems to be a lost author looking at socialist media platforms, not capitalistic social media. Things that actually have a cost should not necessarily be provided for free, even if one does not understand said cost (such as the author of this story). Providing an API is an add-on and not actually required of any social media platform but it does cost money to maintain the storage of the content as well as to not just build said API but also to maintain it (including keeping it secure and scalable). This author seems to be thinking of some kind of socialist media platforms, not the current capitalist social media platforms we have now. Even so, I would definitely welcome the idea of socialistic media platforms although I am not even sure how one could be built and/or managed. This sounds about the same as "McDonald's should be free" and I don't really see a big difference other than the actual consumption of said good/service.


Uh no.


Ok then LLMs must be free.


The third party apps were used to create much of the content which they now want to sell. They're trying to get the milk after shooting the cow.


The problem here is capitalism. You can fight it with high-minded ideals like free API access but that’s really just polishing the brass on the Titanic.

UGC sites need to be collectively owned by those who create the value or, in the very least, held in a nonprofit like Wikipedia is.

Without such guardrails there will be constant pressure to extract more and more value from the users who create the value.


How about this scenario: Reddit goes ballistic and locks everything behind a 100% paywall under a (flawed) assumption that they have enough critical mass of content for people throw some $5/mo bone their way just to access existing content.


It is true that Reddit has problems with the way it operates. But the absurd argument that it should be provided "must be free" is not convincing. Such nonsensical claims are likely only made by socialists.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: