Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> It wouldn't even be a significant cost and ads would recoup it anyway

I 100% don't get it when websites with lots of content and good SEO just delete everything. Fatwallet (fuck you Rakuten). Yahoo Answers (hey, I didn't say good content).

Even on my own Drupal blogs that I didn't want to maintain/update anymore, I did a giant curl job, recursively regex'd out the login/comment submission fields and dumped them on s3. Voila. And I'm someone that would take 9 attempts to do FizzBuzz.

A bookmark you had to it 20 years ago still works and I still get some Adsense cheque residuals from it for zero effort.



I did some work for a design agency that got bought by Twitter.... we basically converted all their different sites to full static and put them on ice. It would have been expensive to keep hosting how they had been, but with everything flattened to static it's super cheap to keep up (although no publishing new stuff without some by hand work or restoring the old servers).

All the good articles and design samples stay online, and maybe the original authors can get a little credit or name recognition from any search traffic that finds it.

The one "liability" to doing this would be for visitors to not realize the site isn't being updated anymore. If vice is still publishing content outside their website they'd have to make it clear that the site is archived and new content is elsewhere, or people might get the wrong idea and assume nothing new is coming.


Could the internet archive offer this as a paid service?

For a reasonable fee, we'll archive your site, and give you back a copy of the assets you can turnkey host on the original domain for cheap with a static hosting solution (s3, cloudflare, etc).

Everybody wins


I've been wondering this too. I've even been considering authoring a web standard to allow hosts to specify how their pages can be archived in a standard way (e.g. which scripts to include, etc.) and then pitch the IA to offer a "pay $X to archive this data forever" deal to the universe.

I'm really curious what the cost per byte would be to make it worthwhile to offer a "host this byte forever, for one up-front fee" service.


> I'm really curious what the cost per byte would be to make it worthwhile to offer a "host this byte forever, for one up-front fee" service.

https://help.archive.org/help/archive-org-information/

> *What are your fees?*

> At this time we have no fees for uploading and preserving materials. We estimate that permanent storage costs us approximately $2.00US per gigabyte. While there are no fees we always appreciate donations to offset these costs.

There's some discussion about this idea on this thread, including comments by ?id=markjgraham, who manages the Wayback Machine, thoughts from John Carmack:

https://news.ycombinator.com/item?id=29639222

https://twitter.com/ID_AA_Carmack/status/1473327982605385735

https://threadreaderapp.com/thread/1473327982605385735.html


Amazing references! Thank you!


No no, the exact opposite. If this bundle of content is so valuable, then someone can make a business out of buying it. Vice could go to WeBuyOldIntellectualAssets.com and get a flat price for it all, and that company would host it or do whatever with it.

The same thing happens with brands - someone bought the Montgomery Ward brand at a bankruptcy auction or something - and with store inventory: once the store goes bankrupt, they just sell the entire store contents, right down to the fixtures, to a liquidator who brings in the "Going out of business! Everything must go!" signs.


This is how Saks Fifth Avenue is actually, when you peel back all of the onion, the honest-to-God 17th century Hudson's Bay Company


Found the young'un :)

That's too attractive to malware peddlers. It's not particularly widespread currently, mainly because most of the content has been centralized into the same big silos... But what you're envisioning here is just going to get abused by abusers


I think the issue with this solution is that the seller loses control over their branding (namely what ads to show) if they do this.


Not just brands, but software too. The primary software I support at my day job was acquired by a company that, based on their other assets, can be described as where software goes to die.

We’re migrating away but they’ll squeeze out what they can from those that don’t/can’t/won’t.

But they still have to provide continuous support, some amount of updates to keep customers functioning, and maybe even get some new customers as a “value” option (that’s barely functional).

Happens to forums all the time (fuck you Internet Brands and Vertical Scope).

100000x easier to do all this with static web content.


They seem to offer something like that:

https://www.archive-it.org/

However, the footer of that website says 2014 and the about page is broken, so not sure if it's still supported.

Also, Cloudflare has a partnership with Web Archive and they offer something similar, but I think it's only made for temporary outages and only archives the most popular pages on your site


The site's last act was to archive itself


Could they (or a for-profit company) bid on it? Do liquidators in the US have to consider any offer, even if it was unsolicited? Does it vary from state to state?



That works too, but there is something to be said about a turn key solution friendly to corporations who are willing to just throw some money to make a problem go away. Plus archive will get a bit of extra money for the Wayback machine!

Just pay some donation, redirect your DNS to the way back machine and bingo.


they are already on it. they do really important work. worth helping them out with funding.

https://twitter.com/Chronotope/status/1760755908219466017


Note that Archive Team and the Internet Archive are separate, unaffiliated entities, though they do often work together.

Archive Team is a loosely organised group of individual volunteers that share a common interest in Internet preservation, and develop tools and share notes to serve that goal. They're basically one of your old-school Mediawiki communities, with very little budget:

https://wiki.archiveteam.org/

Internet Archive is a full-blown multimillion dollar `501(c)(3)` nonprofit, which functions as more of a general-purpose library. They maintain physical offices and datacentres in multiple countries, host many petabytes of data, do activism, run conferences, and when they develop custom tools it tends to be somewhat more advanced than the Archive Team's decentralized web scrapers, like custom book scanning hardware:

https://archive.org/details/eliza-digitizing-book_202107

A lot of the information in the Wayback Machine, which is run by the Internet Archive, was saved and contributed by Archive Team. For example, as of writing this comment, that is true of the latest snapshot of `https://www.vice.com/en`. You can see this with the "About this capture" button on a Wayback Machine capture.

Both groups have ways to receive monetary donations.

For Archive Team though, I wonder if it would be more useful to donate compute by running their Warrior archiving VM/container, or contributing code to their GitHub:

https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior

https://wiki.archiveteam.org/index.php/Dev/Source_Code


That's the archive team, not the internet archive.


yes, but everything is going on the internet archive. https://twitter.com/Chronotope/status/1760764792887746724


I think the issue is for the IA that isn't lucrative enough to make it worth there time. Someone already did it for them for free, even if it wasn't 100% as good as they could have done it.


> we basically converted all their different sites to full static and put them on ice. It would have been expensive to keep hosting how they had been, but with everything flattened to static it's super cheap to keep up

I got absolutely tarred and feathered at an agency for suggesting this strategy and I just want to say thank you for validating what I wanted to do.


Mabe they weren't particularly pride in what they accomplished over the years and wanted it forgotten? :-)


Out of curiosity, what was the argument against it?


I had a proof of concept for us to migrate sites to a static / low / no cost hosting option.

They were unwilling to spend time "training" the team to learn react (in 2020).

They were unwilling to let their senior FE dev spend any time with me to correct the CSS issues I was struggling with.

They used this deception and dishonesty to say "it didn't work" and wasn't worth any more time. I build a prototype in a week. It's not like I spent month(s) on it without any ROI. They just wouldn't look at it because that would mean acknowledging I and or / my ideas had value.

The closest thing I got to an answer is that the CMS they preferred, which was chosen 10 years ago by people no longer working there... was the only way they could support client sites. Because that's what they've been using. Turnover means it's so hard for us to support anything new because we have no time...

Basically they Brawndo'd me.

It was hard to stomach getting fired by the incompetent people driving the business into the ground when I was literally pleading with them to implement money saving measures.

The horse sometimes would rather kill itself than drink the clean water you've found... that's just life. It's hard to accept.

https://i.redd.it/h2lmsdwqtci81.jpg


The old content can become a liability. Maybe the brand has shifted, or norms have shifted and the old content is off-key.


It was not an issue of refining the content on one site, you make a good point though.

It was more of an issue that they had a bunch of clients on an old CMS system, and they did not want to make any changes the way that the sites were built or hosted.

I can make arguments for and against either side of this idea, it all depends how you want to run your business.


You still have to maintain the domain, and stay on the hook if the site gets overtaken/hacked/defaced.


I know that web security is a hard problem but I can't wrap my head around the fact that static content has the same issue as a Wordpress blog.


That’s correct.

Only skip-a-heartbeat moment was when aws sent me an email saying that you I have “one or more S3 buckets that allow read or write access from any user on the Internet”

But none of my containers had write access. All of them had public read, but yeah, it’s a website and they know this: their own route53 DNS points to the containers.

They just sent the same generic mass email to everyone with any public container.


If you don’t have some form of automated throttling, couldn’t that still become costly if a popular webpage started pulling resources from that bucket?

If so, their warning could have been phrased better, but isn’t incorrect.


You can put the bucket behind Cloudfront and only give access to the bucket to Cloudfront.


But then, would the OP get an email saying they have “one or more S3 buckets that allow read or write access from any user on the Internet”?


The engineering team (that they fired) would have needed to explain that difference in so many details to stakeholders who just wanted to move on.


I'm sure there's liability, copyright, and trademark concerns as well.


If they would want to monetise their content in the future, they must keep it private before it's gobbled up by search engines and AI and becomes public domain.


Search engines have already gobbled it up the moment it was published. And it is probably in the training dataset of many AIs.


hmm, yeah all sites in the future that find themselves unprofitable take content offline to maintain license to AI potential.


That liability seems really easily solved with a banner.


yeah, newspapers do this all the time. Could also regex the dates to show up really large.


This is an interesting way of saying that contemporary web “best practices” are a giant waste of money.


> The one "liability" to doing this would be for visitors to not realize the site isn't being updated anymore.

Nothing a simple notification bar can't solve


I don't understand why publishers take down Kindle books when the paper book goes out of print. It happened to one of my favorite scifi novels.

It takes zero effort to keep the book available (I know, I self-published a silly little one), and zero effort to include it in your accounting as long as there's a data feed and a computer.


It probably has to do with publishing rights. Authors may not want to allow digital publication without an actual print run, and once the initial print run ends they lose the digital rights unless they do another print run. Or the digital publication rights may only be negotiated for a fixed period of time, or else require an ongoing fixed payment to retain, so it costs the publisher money to continue offering a digital version that they may not recoup without sufficient sales.


Halting digital sales might be necessary to declare a write-off and recoup a tax benefit that year. That was happening to some streaming shows, anyway.

Which sci-fi novel?


The Golden Age trilogy by John C. Wright.

I read one of his later works and it was horrid, but Golden Age was incredible. The first several chapters were hard to get into, I quit several times and the people I loaned it to never got past that. But after that it takes off like a rocket. On a reread I found that the difficulty at first was just from so much being unfamiliar.

It's in a distant future with superintelligent AI, immortality, physical abundance and pervasive virtual reality. And in that setting he finds a deeply human tale of epic heroism.


Usually it is because the rights have reverted to the author and the publisher no longer has rights to publish the book.

Meanwhile, the author doesn't own the rights to the cover art/design so they can't just put it up either.

I'm not sure how else you'd want to handle it.


> I'm not sure how else you'd want to handle it.

By having a minimum amount of foresight and putting into the initial contract an agreement which lets the parties maintain the online availability in a mutualy agreeable way.


Why do you think it's a problem of foresight and not simple motivation? Perhaps none of the parties cares about the availability of a book that doesn't make enough money to stay in print. Perhaps having the book become unavailable for some time is perceived as a benefit to the rights holder, allowing them to do a re-launch.


The question wasn’t what is the problem, but “how else you'd want to handle it”.


My point is that it's quite possible that both parties are perfectly happy with the current solution of having the ebook disappear from stores.

It's just us as fans who are not happy.


> It takes zero effort to keep the book available

It isn’t about the effort. This is the same thing people say about out of print video games, too.

It isn’t about difficulty, it is about incentives. They want you to buy new books and video games, and if you are reading/playing old free ones, that is reduced demand for what they are selling. Why would they want to help you satisfy your need for free?


I've taken a non-fiction tech book offline. It was just really dated and I didn't really want it out there any longer as a result.


> "I did a giant curl job, recursively regex'd out the login/comment submission fields and dumped them on s3"

That is not zero effort. That is at least 1.2 efforts.


1.2 efforts which grants you residual revenue for life. Doing it is a no brainer, but some people lack one.


It can be also that some decision makers feel it is too much hassle. Or they don’t even know it is an option.

For you and me doing this would be probably an afternoon’s work? (Maybe a bit more, maybe a bit less)

For someone less technologically inclined it could be seen as a big project. They need to find someone capable of doing it, they need to ask for a quote, they need to supervise the project otherwise the contractor doing it might just do a half assed job or none at all.

Not to mention they can only think about doing this if they have an inkling that it is possible. They might be operating in a mindspace where “maintaining the servers” is a large monthly expense. For example if years ago they were sold a CMS with all the bells and wistles for some $bigbucks recurring cost. If they are savvy business people they might have done their research to figure out if “this” can be done cheaper or not. But they might not realise that it is possible to change the requirements such to achieve a massive reduction in cost. This is especially true if they treat the cost of servers as a kind of black box.

And very often the people who are providing them with IT services are not incentivised to tell about this option. Will they tell the business owner that oh by the by for half the monthly recurring cost they are paid the business could find someone who puts the page on ice and for the other half runs it for the next decade? Of course not! Doing so would under cut their income stream. That would be crazy. In fact they might spread all kind of FUD and sabotage attempts at scraping the site.


I firmly believe that if they think their only option is to shut down, they aren't fit to do their job. If they don't understand how the internet works they should let someone who does do the work.

If you are a business built on top of the web you need someone who's tech savvy in-house.

Maybe they deserve their faith. But it's sad that the next generation will miss this insightful content because they gave up.


The ChatGPTs of this world will solve that. I like to believe that I know a couple of things about a couple of things regarding technology and sometimes I ask ChatGPT or Gemini "how can I do so and so, list/name five pieces of software or technical solutions to do so-and-so".

I use it/them as a search engine on steroids. Maybe it is time more people also do so.


Up vote from me for very true

> Not to mention they can only think about doing this if they have an inkling that it is possible.


If you're getting residual revenue on a website, at some point someone's going to figure they could get a bit more residual revenue by adding some ad scripts, and pretty soon you've got an entire stack for serving ads that needs maintenance and ROI.


I just let Google do that. Sure it's not max possible value, but return on effort is pretty good. Advertisers can buy through google and target your site. If you get enough traffic, you can probably cut a better deal (ok, that costs negotiating time, but if the ROI is there...)


Revenue for life also means that you're filing taxes and potentially other paperwork about it for life.


Those Yahoo Groups could be a trove of niche, otherwise uncollected information, especially with regard to vintage or specialty electronics. Removing them was a huge loss.


It's all on groups.io


Group owners had to import; it was not all preserved.

There was some effort done with public groups by the ArchiveTeam, but I know they didn't get everything as there was no good way to find all groups.


> did a giant curl job, recursively regex'd out the login/comment submission fields and dumped them on s3

it's because even this cost outweighs how much they care about the content, which is 0... the people who make decisions like this aren't scrimping pennies or interested in preserving effort... they're looking for the simplest way to get millions of dollars into their pockets

doing the bare minimum to maintain a library of content indefinitely isn't it

these are the kind of people who would happily set fire to a library if they could get away with insurance fraud


Or even Digg... Lots of comments and discussions lost.


This is why I post on slashdot. They've passed the test of time (but not UIs, fuck beta). Looks like their first posts in '97 start here: https://slashdot.org/?page=8582 dunno what their december 31st, 1969 posts are after that (errors? intentional de-ranking?)

Newspapers have a bad history of "experimenting" with enabling online comments and then deciding the experiment failed and delete them all. You're a newspaper, you're not supposed to delete history when you don't like it!!!

And then they complain when people use social media as their newspaper.


It's very depressing that all the comment sections from the late 2000s to mid 2010s are nowhere to be found. Also a lot of live journal type sites. Comment sections seem omitted from Internet Archive snapshots, but I find them in many ways more worthy of archival than the published articles that make the cut.


That type of data would be so interesting for things like historical sentiment analysis


> That type of data would be so interesting for things like historical sentiment analysis

Except that internet commenters are very weird, and like the least representative sample ever. Not quite as weird as Wikipedia editors, but still really weird.


They aren't representative of the general public. This can still make it very interesting though. Do trends show up earlier among commentators? If so, has the time it takes for the trends to flow to the mainstream changed over time. Has the likelihood at which online commentator trends flow to the mainstream changed? It's the influence more pronounced for specific subjects?


Comments from people actually close to areas mentioned in articles, detract from the effectiveness of /THE MESSAGE/ and had to be removed.


It is very depressing, but on the other hand you'd have millions of comments written in another era (pre-culture wars, when the Internet was more, let's say, "tolerant") that can be now traced back to the authors to cancel them. With infinite memory you need protection, otherwise it's un-erasable damnation.


The last time I looked at Slashdot comments (2021, give or take) they were low-effort trolls, racist/sexist, or just gibberish. Has moderation improved there or is it still a cesspool?


I scraped a sample of their posts ten years ago and ran a regression on user activity by ID. They have zero significant growth in user base other than mobile users posting more as AC. There was a small core of older active 5-6 digit UIDs doing the bulk of the posting and that was shrinking toward zero around now. Slashdot will die in the near future even if Netcraft can't confirm.


As one of those 6-digit UID posters, /. has been dead for quite some time. Discussions barely breach 50 comments or so now days, for the most part. The firehose sucks. The 'editors' constantly post dupes and the left hand seems to have no clue what the right hand is doing.


> December 31st, 1969 posts

Unixtime is the number of seconds since 1/1/1970. Subtract a few hours for timezone post processing and you get 12/31/1969 as a date. Indicated time zero or null or missing value trying to get formatted as a date.


Pedantically, Unix time measures the number of non-leap seconds that have elapsed since 00:00:00 UTC on 1 January 1970.

This is Dec 31 1969 at 7:00 PM in Eastern Standard Time.


It doesn't sparkle joy to them anymore.

I wish I was joking, but reducing the amount of stuff that is owned/managed for the sake of it is a common philosophy. Another way to put it: they're focusing.

We think it would have been extremely low effort to keep a static site running, they probably thought not having to think about it at all was worth the loss.


I imagine they still have to organize paying royalties and the like


It takes continuous effort/money to keep metaphorical company lights on. There is things you need to file periodically, legalities you need to comply with, also when they change in the future, just to exist.

So if the website shutdown coincides with the shutdown of a company or a division within the company, that might be why. And since a website will usually not shut down if it's turning a considerable profit to begin with, just deleting everything can often look like the best option.


> Yahoo Answers

OMG I haven't thought about the site in a number of years.

Used to be such a common thing to see in google search results.


This stuff should go to the national archives imo. Same with reddit and hacker news.


Ever more bummed about Fatwallet as Slickdeals continues to worsen with endless "sponsored deals" and censorship to force people to use their cashback and price tracking products.


One of the relatively rare times being Canadian pays off for online resources - RedFlagDeals.com is still doing well in their forums.


I wonder what SuckIsStaples is up to?


but how is babby formed?


> And I'm someone that would take 9 attempts to do FizzBuzz

Hehe


Holy shit I totally forgot yahoo answers was a thing. That was indeed a shame, even though most of it was complete garbage.


Honest, technical, humorous; Good comment!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: