Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Textfiles (textfiles.com)
148 points by the-mitr on Jan 21, 2024 | hide | past | favorite | 51 comments


Just want to note that Google Chrome headless recently landed support for correctly rendering text files. Prior to this, when in headless the file was not rendered, and IIRC was simply downloaded, unlike what happens in regular headful chrome.

As a workaround before this fix, I created a secure document viewer that used pandoc and latex to template layout the text file in correct form. We intercepted the navigation request / download request to a text file and passed it to the secure viewer to be rendered using latex and pandox -- most complex pipeline ever for displaying ASCII!! hahaha :)

Interested readers can find the latex template here^0 and the overall projec this was part of, here^1 :) haha :)

0: https://github.com/BrowserBox/BrowserBox/blob/e437155fb582cc...

1: https://github.com/BrowserBox/BrowserBox


Great site. Jason Scott is a pretty influential chap.

My fave is the Goatse Files: https://news.ycombinator.com/item?id=22997724


He also made some great documentaries, as I'm sure a lot you guys will know, amongst which:

    - Get Lamp: https://en.wikipedia.org/wiki/Get_Lamp
    - BBS: The Documentary https://en.wikipedia.org/wiki/BBS:_The_Documentary


They're both great. Get Lamp also has an Infocom-specific cut and BBS really captures a lot of interviews with often pre-Web creators and other information that would otherwise be pretty obscure at this point. BBSs were sort of a niche at the time and, while there are a couple memoir-type things (and the textfiles archive of course) there aren't a lot of authoritative sources for the history especially if you don't have a lot of context to make sense of what is out there.


This guy is just a bad person. It's a classic case of someone going so deep into understanding something that it detaches them from reality and their humanity. You understand the general structure of the Web (hotlinking) and how to "mv" which took you months or years of concentrated study to learn and set up, and you forget that under normal circumstances you wouldn't give in to your sadistic urge to show children a picture of a man's asshole.

And then all the other psychos, like you, come to the retelling of the story to fantasize about doing it too.


>This guy is just a bad person.

Do you have anything to substantiate this or are you just going to remain ignorant? Hotlinking was creating a massive bandwidth problem, this is a perfectly valid solution - you don't control content on other people's site and should have absolutely no expectation of such.


Do we agree that showing a 14 year old a picture of a man's gaping asshole is bad? If this image had "nearly a hundred thousand viewings" and 17% of MySpace's user base in 2009 was under 18[0], what's the math on that? These were people writing HTML for the first time and had the most basic understanding of it, they didn't know what IP or HTTP was, these were just kids trying to make their computer display a cool picture of a Grim Reaper to them and their friends. They probably didn't even understand the mechanism behind why pasting a link between some triangle brackets (which is already pretty difficult to figure out) into their page made it show the image or that bandwidth came from somewhere. And that's why he did it. They deserved it because they were posers, lesser intellects, that got on his internet after September 1993.

Why call me ignorant? I understand perfectly at the technical level what happened, but words like "hotlinking" and "bandwidth" and "solution" shouldn't come into play here at all. If your logic made you permanently imprint a shock image on thousands of impressionable minds you're not the hero. How much did it even cost to upload a 60kB image half a million times per month in 2007? $10?[1]

[0] https://statista.com/statistics/266881/myspace-users-in-the-... but it's of course the parents' fault for not reading the RFCs and realizing that sadistic web hosts were a flaw in Tim Berners-Lee's design

[1] https://blog.codinghorror.com/the-economics-of-bandwidth/


Don't let the others keep you down. Keep fighting and I'm sure we can take this Jason Scott guy down. I believe in you, and there's many of us.


Thank you for your service, sir.


you sound like the guy who stole his coworker's milk from the fridge day after day and then complained when she swapped it with her own breast milk :)


>Do we agree that showing a 14 year old a picture of a man's gaping asshole is bad?

This is disingenuous. He didn't show them anything; they went looking and found something they didn't want. As they say, you can't go to shit's house and be surprised when shit is home.

>They probably didn't even understand the mechanism

Which is their problem, regretful as it may be. Or more realistically, it's their parents' problem.

>words like "hotlinking" and "bandwidth" and "solution" shouldn't come into play here at all.

That was the problem that started it, so they're absolutely relevant. Pretending that this doesn't matter is willful ignorance. No one was made to view anything.


lol


> And then all the other psychos, like you, come to the retelling of the story to fantasize about doing it too.

Well, isn't that special?

I guess you were one of the folks that got goatsed. Not my fault.


My pillow is soft and I sleep well.


These files remind me of my personal nightmare of unawareness learning about the world through the early web. It was awash in these, many of which I read in many forms, including text and web conversions. It was so hard to know what was real and imagined. You could say that that's still true, but I'd argue we have more power than ever to know. The information content of our communication systems is just fantastically higher. From 8 bits to the moon.


Kinda makes me think of applications of Deep Learning research atm.


Hah, found this for the content of armstech.txt [Jane's Fighting Ships 1990-1991 (Specifications of Warships)]:

```

From: Janes Copyright <copyright@janes.com>

To: "jason@textfiles.com" <jason@textfiles.com>

Sender: "Ward, David" <David.Ward@janes.com>

Date: Wed, 21 Jul 2010 03:38:46 -0600

Subject: Unauthorised hosting of Jane's Fighting Ships data

Dear Mr Scott,

I bring to your attention that your website (www.textfiles.com) is hosting information that is the copyright of IHS Global Limited.

Whilst I understand that the textfiles.com site is not hosting this data for any monetary gain you will, I am sure, understand that copyright exists over this data and that IHS Global Limited has a strong interest in ensuring that the data is available only through its own channels and through its own brands.

The data in question is from the 1990-1991 edition of Jane's Fighting Ships and can be found in the text file 'armstech.txt', which is located at the following location on the textfiles.com site, http://www.textfiles.com/fun/ armstech.txt.

As stated earlier, the data held within this file is the copyright of IHS Global Limited, owner of the Jane's Fighting Ships publication and brand, and is available only to its subscribers; by hosting this data and making it free to download you are in breach of international copyright laws.

I therefore ask you, as proprietor of textfiles.com, to remove this data from the www.textfiles.com site and any associated mirror sites within the next 7 days from the date of this email and confirm your action by reply. Failure to take action may result in this matter being placed in the hands of the IHS Global Limited legal team and further action being taken against you.

Yours sincerely, David Ward

David Ward

Head of Production Operations

IHS Jane's

IHS Global Limited, Sentinel House, 163 Brighton Road, Coulsdon, Surrey CR5=

2YH, United Kingdom

Phone: +44 (0)20 8700 3874

Email: david.ward@ihsjanes.com<mailto:david.ward@ihsjanes.com>

Web: www.janes.com and www.ihs.com

```



I thought data itself couldn't be copyrighted. I wonder what the original file contained.


Many think otherwise and will pay a lot of money to make that true.


Looks like full text, not just data, so almost certainly covered by copyright?

On the other hand Janes Fighting Ships without the illustrations seems pretty boring. Here is what it should look like (the 1929 edition): https://archive.org/details/janes-fs-1929-30-images/page/447...


It probably depends what was in the file other than just specifications. In any case, probably not worth fighting about.


I really like this talk Jason Scott did at defcon about being sued for 2 billion dollars https://www.youtube.com/watch?v=74g7wSTYUso


Between bash.org and textfiles, a big chunk of my youth was archived through our logs and zines. Because of Jason, I'd hung onto everything - logs, pictures from meetups, web designs, source, and plenty of dumps, emails, news stories, and recordings that covered a bit of the history behind some events around the turn of the millennium with the intent of turning them over one day, but lost pretty much everything in an unfortunate beer accident in my 20s.

It didn't feel like counterculture, just that everyone else was wrong, you know?


Sorry it didn't work out, but thank you for trying. I often say that things are rare because life happened, not because of some incompetence or evil. We do our best to preserve and not everything makes it.


Ahaha, I discovered this website as a teen and actually learnt a lot about sex and sexuality from text files on it. Fun stuff!


There are certainly worse places on the internet


38 year old. I used to go on here as a teenager and even then I felt like Gandalf pouring over a mysterious, powerful, and ancient text resource. I was enamored by the "hacking McDonalds" file and wished it were't obsolete even at the time of reading.



This is beautiful - thank you for posting this. I was jettisoned back to my past immediately. I instantly recognize familiar filenames browsing the site.

Seriously this made my day. I had completely forgotten about this (it was a big part of my life...)


I never forgot, so you could remember.



You rang?


Search for Michio Kaku


> TEXTFILES.COM has been online for nearly 25 years with no ads or clickthroughs. If you feel like donating to its roughly $1200 yearly upkeep: Paypal or Venmo.

Performance-wise, this doesn’t seem like a $100/month site, considering it’s only simple HTML and text files.


As people are wildly speculating in this thread (it is Hackernews, after all), I'll just say that yes, more than you can imagine, my provider has been instrumental on the site staying up through a number of challenges and controversies and that number is my number to keep the site there with the features and arrangements I have. Regardless of what people think the Internet is like in 2024, there are vanishingly small places that will host a site like TEXTFILES.COM for very long.


    $ rsync --dry-run -a --stats rsync://rsync.textfiles.com/textfiles ./textfiles.com
    …
    Number of files: 866,935 (reg: 826,831, dir: 40,083, link: 21)
    …
    Total file size: 319,889,987,473 bytes
    …


Just as a note, you're only getting some of the data that way. The contents of textfiles.com are roughly 2 terabytes of *.textfiles.com endeavors and another 13 terabytes of mirrors and hosting I do.


Ah ha, that explains it. Thanks.

Even on Digital Ocean, that would be $50 for the 500 GB of block storage, another $20 for keeping a backup, and ~$6 for the actual droplet for hosting. And then some extra bandwidth costs, if people try to bulk-download. Adds up.


That site hosts a lot of historical material that many "budget" providers would probably balk at and kick you off their network without much recourse. Having a good relationship with a provider that knows you and will go to bat for you can cost some money. That said, $100/month is not terribly expensive for dedicated hosting.


Couldn't you just host all of these files on Github under the pretense of readme files? Would they throw a fit at this?


It’s probably outgoing bandwidth costs.


Given it's mostly text, it's curious there's no content-encoding applied to HTTP responses. Would reduce bandwidth by something like 70-90% in most cases.

Though it's hard to say what's the best configuration without understanding the hardware context.


Above it’s noted at being north of 300gb, most cheap cloud providers will give you quite a bit of bandwidth for a 300gb server.


Might be a point to self-host something like this. The value of the project is its extreme longevity, and the half-life of external hosting services is probably something like 5 years.


It probably gets mirrored a lit too, given the target audience.


As I recall, Jason asks people not to do wholesale site copies in general, but I imagine a great number do.


You imagine correctly. I cruise open directories regularly, usually looking for books to add to my growing collection (note: books I will actually read/use), generally taking only 2 or 3 books with me when I leave the site. At least once a month, I head to an OD I like to grab a new book or two only to find the host shut the gates because it got posted on r/opendirectories and a bunch of people did pointless site rips, as though they'll read 40k pdf files or whatever. I often wonder if they do it because data caches like that are small currency on other parts of the Internet, sort of the way you had to maintain an ul/dl ratio to stay on a well-maintained BBS back in the day. Who knows.

They're certainly not rehosting the libraries in any useful way, I can say that much.


I think a lot of it is just that many people have a pretty deeply ingrained data hoarding impulse where collecting files--any files--for themselves is the end goal.


I don't know if it's always hoarding. A lot of stuff on the internet simply disappears. Especially when it's hosted by a single person, or an org that might not be around in 3 years, or they get takedowns (whether legal or not).

If you find something you'd like to keep accessing, downloading it all in advance can be a smart move.


So download specific items as the parent says (and as you suggest). But reflexive "download it all just in case" probably isn't helpful especially if it's just for you.

I'm not disagreeing with downloading stuff that you want, even maybe site sub-sections. But a lot of the time it turns into just doing a mass download.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: