Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: 4chan Archives Pre-2008?
67 points by TerryHasRisen on April 30, 2024 | hide | past | favorite | 22 comments
hey, I've been recently going down a rabbit hole in terms of 4chan history. I am wondering if there are any archives of 4chan, regardless of image availability, out there.

Thanks!



https://archive.org/details/4chan_threads_archive_10_billion

"This is a text-only compilation of 4chan threads, primarily from the years 2006-2008. The total unique threads in this collection is roughly 10 million."


> 9000


It seems like it wouldn't be hard to train a model on this era of 4chan and create a simulacrum. Posts would happen at a certain frequency with all the users being simulated. I wonder if you could even allow user posts and have the model reply in a realistic way (bullying and all)?


The remarkable thing is how effective bullying is at moderating idiots. Astroturfing state actors aside, the site maintains a high level of on topic civility.

There's something inherently wrong with ordering posts by vote count, both because a confident/deceptive idiot can spread his lies like a virus and because it's too easy to game.


I'm reasonably sure this is the first time I've ever seen a claim associating civility with 4chan.


I suppose you need to be deep in the trenches to find anyone still willing to have discourse there. But as another commenter said, some of the niche boards, like subreddits, maintain some pretty high quality conversation.


On a small scale I've seen this work, and maybe early 4chan was like this.

Current-day, though? Nah, idiots attract more idiots and most of the site sucks save for a few niche boards/topics.


Slashdot solved this ages ago with its moderation system, vote moderation has been flawed since inception, because people downvote when they disagree not based on the quality of the post.


Your scientists were so preoccupied with whether or not they could, they didn't stop to think if they should


> It seems like it wouldn't be hard to train a model on this era of 4chan and create a simulacrum.

Out of curiosity, are there any models trained on HN posts?


You're using one right now


Shadowbanned riding like the wind.. Gives model-citizen a whole new ring..


The problem is that the model we’re using now is locked in the world’s most secure HSM and you can’t use several copies at once.


It works for X


I don't have the link handy but I do remember reading about an LLM project trained on 4chan.



[flagged]


my comment was pure substance

and I think LLMs are generally stupid & overrated tbh

but the 4chan idea is a fun gas


Soon to be a question asked about the entire internet pre-2019.


Since governments started to get more involved the more and more data lost forever. It’s good to see that at least projects like archive keeps it. But I’m afraid that will be lost too.


I miss my geocities site. I found it's path the other day and sadly isn't in the snapshots taken that I've seen. Ashes to Ashes, Dust to Dust.



sup/tg/ might go back that far, just for /tg/




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: