Just want to note that Google Chrome headless recently landed support for correctly rendering text files. Prior to this, when in headless the file was not rendered, and IIRC was simply downloaded, unlike what happens in regular headful chrome.
As a workaround before this fix, I created a secure document viewer that used pandoc and latex to template layout the text file in correct form. We intercepted the navigation request / download request to a text file and passed it to the secure viewer to be rendered using latex and pandox -- most complex pipeline ever for displaying ASCII!! hahaha :)
Interested readers can find the latex template here^0 and the overall projec this was part of, here^1 :) haha :)
They're both great. Get Lamp also has an Infocom-specific cut and BBS really captures a lot of interviews with often pre-Web creators and other information that would otherwise be pretty obscure at this point. BBSs were sort of a niche at the time and, while there are a couple memoir-type things (and the textfiles archive of course) there aren't a lot of authoritative sources for the history especially if you don't have a lot of context to make sense of what is out there.
This guy is just a bad person. It's a classic case of someone going so deep into understanding something that it detaches them from reality and their humanity. You understand the general structure of the Web (hotlinking) and how to "mv" which took you months or years of concentrated study to learn and set up, and you forget that under normal circumstances you wouldn't give in to your sadistic urge to show children a picture of a man's asshole.
And then all the other psychos, like you, come to the retelling of the story to fantasize about doing it too.
Do you have anything to substantiate this or are you just going to remain ignorant? Hotlinking was creating a massive bandwidth problem, this is a perfectly valid solution - you don't control content on other people's site and should have absolutely no expectation of such.
Do we agree that showing a 14 year old a picture of a man's gaping asshole is bad? If this image had "nearly a hundred thousand viewings" and 17% of MySpace's user base in 2009 was under 18[0], what's the math on that? These were people writing HTML for the first time and had the most basic understanding of it, they didn't know what IP or HTTP was, these were just kids trying to make their computer display a cool picture of a Grim Reaper to them and their friends. They probably didn't even understand the mechanism behind why pasting a link between some triangle brackets (which is already pretty difficult to figure out) into their page made it show the image or that bandwidth came from somewhere. And that's why he did it. They deserved it because they were posers, lesser intellects, that got on his internet after September 1993.
Why call me ignorant? I understand perfectly at the technical level what happened, but words like "hotlinking" and "bandwidth" and "solution" shouldn't come into play here at all. If your logic made you permanently imprint a shock image on thousands of impressionable minds you're not the hero. How much did it even cost to upload a 60kB image half a million times per month in 2007? $10?[1]
>Do we agree that showing a 14 year old a picture of a man's gaping asshole is bad?
This is disingenuous. He didn't show them anything; they went looking and found something they didn't want. As they say, you can't go to shit's house and be surprised when shit is home.
>They probably didn't even understand the mechanism
Which is their problem, regretful as it may be. Or more realistically, it's their parents' problem.
>words like "hotlinking" and "bandwidth" and "solution" shouldn't come into play here at all.
That was the problem that started it, so they're absolutely relevant. Pretending that this doesn't matter is willful ignorance. No one was made to view anything.
These files remind me of my personal nightmare of unawareness learning about the world through the early web. It was awash in these, many of which I read in many forms, including text and web conversions. It was so hard to know what was real and imagined. You could say that that's still true, but I'd argue we have more power than ever to know. The information content of our communication systems is just fantastically higher. From 8 bits to the moon.
Hah, found this for the content of armstech.txt [Jane's Fighting Ships 1990-1991 (Specifications of Warships)]:
```
From: Janes Copyright <copyright@janes.com>
To: "jason@textfiles.com" <jason@textfiles.com>
Sender: "Ward, David" <David.Ward@janes.com>
Date: Wed, 21 Jul 2010 03:38:46 -0600
Subject: Unauthorised hosting of Jane's Fighting Ships data
Dear Mr Scott,
I bring to your attention that your website (www.textfiles.com) is hosting
information that is the copyright of IHS Global Limited.
Whilst I understand that the textfiles.com site is not hosting this data
for any monetary gain you will, I am sure, understand that copyright exists
over this data and that IHS Global Limited has a strong interest in ensuring
that the data is available only through its own channels and through its own
brands.
The data in question is from the 1990-1991 edition of Jane's Fighting Ships
and can be found in the text file 'armstech.txt', which is located at the
following location on the textfiles.com site, http://www.textfiles.com/fun/
armstech.txt.
As stated earlier, the data held within this file is the copyright of IHS
Global Limited, owner of the Jane's Fighting Ships publication and brand,
and is available only to its subscribers; by hosting this data and making it
free to download you are in breach of international copyright laws.
I therefore ask you, as proprietor of textfiles.com, to remove this data
from the www.textfiles.com site and any associated mirror sites within the
next 7 days from the date of this email and confirm your action by reply.
Failure to take action may result in this matter being placed in the hands
of the IHS Global Limited legal team and further action being taken
against you.
Between bash.org and textfiles, a big chunk of my youth was archived through our logs and zines. Because of Jason, I'd hung onto everything - logs, pictures from meetups, web designs, source, and plenty of dumps, emails, news stories, and recordings that covered a bit of the history behind some events around the turn of the millennium with the intent of turning them over one day, but lost pretty much everything in an unfortunate beer accident in my 20s.
It didn't feel like counterculture, just that everyone else was wrong, you know?
Sorry it didn't work out, but thank you for trying. I often say that things are rare because life happened, not because of some incompetence or evil. We do our best to preserve and not everything makes it.
38 year old. I used to go on here as a teenager and even then I felt like Gandalf pouring over a mysterious, powerful, and ancient text resource. I was enamored by the "hacking McDonalds" file and wished it were't obsolete even at the time of reading.
This is beautiful - thank you for posting this. I was jettisoned back to my past immediately. I instantly recognize familiar filenames browsing the site.
Seriously this made my day. I had completely forgotten about this (it was a big part of my life...)
> TEXTFILES.COM has been online for nearly 25 years with no ads or clickthroughs. If you feel like donating to its roughly $1200 yearly upkeep: Paypal or Venmo.
Performance-wise, this doesn’t seem like a $100/month site, considering it’s only simple HTML and text files.
As people are wildly speculating in this thread (it is Hackernews, after all), I'll just say that yes, more than you can imagine, my provider has been instrumental on the site staying up through a number of challenges and controversies and that number is my number to keep the site there with the features and arrangements I have. Regardless of what people think the Internet is like in 2024, there are vanishingly small places that will host a site like TEXTFILES.COM for very long.
Just as a note, you're only getting some of the data that way. The contents of textfiles.com are roughly 2 terabytes of *.textfiles.com endeavors and another 13 terabytes of mirrors and hosting I do.
Even on Digital Ocean, that would be $50 for the 500 GB of block storage, another $20 for keeping a backup, and ~$6 for the actual droplet for hosting. And then some extra bandwidth costs, if people try to bulk-download. Adds up.
That site hosts a lot of historical material that many "budget" providers would probably balk at and kick you off their network without much recourse. Having a good relationship with a provider that knows you and will go to bat for you can cost some money. That said, $100/month is not terribly expensive for dedicated hosting.
Given it's mostly text, it's curious there's no content-encoding applied to HTTP responses. Would reduce bandwidth by something like 70-90% in most cases.
Though it's hard to say what's the best configuration without understanding the hardware context.
Might be a point to self-host something like this. The value of the project is its extreme longevity, and the half-life of external hosting services is probably something like 5 years.
You imagine correctly. I cruise open directories regularly, usually looking for books to add to my growing collection (note: books I will actually read/use), generally taking only 2 or 3 books with me when I leave the site. At least once a month, I head to an OD I like to grab a new book or two only to find the host shut the gates because it got posted on r/opendirectories and a bunch of people did pointless site rips, as though they'll read 40k pdf files or whatever. I often wonder if they do it because data caches like that are small currency on other parts of the Internet, sort of the way you had to maintain an ul/dl ratio to stay on a well-maintained BBS back in the day. Who knows.
They're certainly not rehosting the libraries in any useful way, I can say that much.
I think a lot of it is just that many people have a pretty deeply ingrained data hoarding impulse where collecting files--any files--for themselves is the end goal.
I don't know if it's always hoarding. A lot of stuff on the internet simply disappears. Especially when it's hosted by a single person, or an org that might not be around in 3 years, or they get takedowns (whether legal or not).
If you find something you'd like to keep accessing, downloading it all in advance can be a smart move.
So download specific items as the parent says (and as you suggest). But reflexive "download it all just in case" probably isn't helpful especially if it's just for you.
I'm not disagreeing with downloading stuff that you want, even maybe site sub-sections. But a lot of the time it turns into just doing a mass download.
As a workaround before this fix, I created a secure document viewer that used pandoc and latex to template layout the text file in correct form. We intercepted the navigation request / download request to a text file and passed it to the secure viewer to be rendered using latex and pandox -- most complex pipeline ever for displaying ASCII!! hahaha :)
Interested readers can find the latex template here^0 and the overall projec this was part of, here^1 :) haha :)
0: https://github.com/BrowserBox/BrowserBox/blob/e437155fb582cc...
1: https://github.com/BrowserBox/BrowserBox