That's what you think until eventually you have 5k+ browser tabs open at once and your browser has slowed to a crawl so you need to dump them all into bookmarks or a session backup and start the cycle again
Or when you lose them for some reason (browser crash, accidentally closing all,...) and you realize you actually don't care that much and just move on with your life.
Firefox let's me keep the tabs on standby. They don't take memory and only load when I click on them. Also I use vertical tabs using Sideberry extension. So it is easy to manage a huge amount of tabs.
I can confirm first-hand that once you hit about 2.8k tabs across two windows, typing in the address bar when opening a new tab gets unusably sluggish even on an AMD 3900X as it tries to search all the tabs for matches to switch to, even if the tabs themselves are unloaded.
Tridactyl also gets unusable.
I finally cleared all the tabs out last week and started over, because it was clear my "reading list" was simply never getting read, ever. There were tabs from February still in there, never read.
I use Add to Reading List in Safari, and like you for the most part don't look at it again... except when I'm on an airplane without wifi, because in the mean time it will have downloaded the page and synced with my phone. So I have a collection of offline reading material always there.
I'd argue it increases FOMO, but feeds on loss aversion (no need to trash a "valuable" link, no need to read anything, but a need to own the list forever).
Great writeup. I too have a long reading list - currently at 133.
I use my own little side project (Savory) to track the list. When I come across a page that I don't have the time or energy to finish right now, I save it and add the "reading" tag to it. When I have free time, I can open the reading tag and pick up something new to read.
The best part is that I usually add a couple more tags (e.g. "security" or "economics" etc.) when I save a link. This way, the reading list allows me to filter by topic. It has been an unexpected hack to attack the growing list, since I am usually able to finish multiple articles in a single run, all in the same topic, because there is usually a link between them even when I might have saved them days or weeks apart.
Anyway I like how OP actually has a reading history. I really need to add something similar in Savory. Right now, when I finish reading something, I just remove the "reading" tag and I don't get a neat history.
I do the same thing but in one big text file. I store all my general notes there. If it's an article I want to come back to, I write "article", if it's a video, then "video", etc. I also write any other keywords and/or information that I might find useful when coming back to it. Then it's a text search away.
Here's something I saved the other day:
22-09-05
article Tactical Decision Game (TDG)
- https://www.shadowboxtraining.com/news/2022/04/29/film-at-eleven/
- title: Film at Eleven
- scenario of the month
- make decisions
- make notes about your thinking
- compare with experts
It's mumbly, low effort and holds all the information I need to both find it and then to see why I was initially interested.
I would love to see every social bookmarking service and RSS reader to have an interoperate-able comment section.
Why should I manage my bookmarks outside of the browser if not for the benefit of receiving additional information about the articles?
Never since have I found such unique and interesting web apps/essays/videos. Was very cool how it took you to the actual site, but somehow felt a lot more curated than the link aggregators of the time. Every single click was so fun! One of my favorites was a sort of proto- /r/place, where you could place pixels as many times as you wanted on an infinite canvas. Sometimes you could return days or weeks later and still find your art.
Yes, del.icio.us, and something similar still exists at pinboard.in, but sadly without the full social sharing features which where available at del.icio.us
I'm still waiting for a web extension that sends a copy of the webpage I'm looking at (for more than 1 minute) to an endpoint I specify along with some metadata like the URL and user-agent. Obviously, block certain domains like banks or email.
I'd like something to build up a searchable index of everything I've read recently so I can easily find content again yet this is NOT something I want a 3rd party to do. I want to self-host something like a tiny Go or Rust server that only uses 10mb of ram to index all the pages into an embed rocks/level/badger/etc. database.
If you’re on mac, have a look at DevonThink, they offer a server version so you can access your stuff from anywhere but you wouldn’t even necessarily need it if you’re just using all your own devices. With the normal version all your databases are just stored with whatever cloud provider you want to use or I think you can do WebDAV if you want to use your own. I absolutely love it.
That app was the reason I bought my first Mac, after a lifetime of Windows. Must have been in 2007 or something. I had read Steven Berlin Johnson’s account of how he uses DevonThink to research his books. Nice to hear people still use it in times of Roam, Notion and Obsidian.
I think DevonThink complements Obsidian, Roam and the like perfectly, at least for me. One is the repository where I collect, search and annotate the thoughts of others the other is where I collect and solidify my own thoughts. Such a great application.
Thanks for mentioning this author, I was not aware of him before and I love reading about other people's workflows, going to hunt out that account!
This is a neat writeup. It's fun to think about how to potentially automate this kind of tracking.
> I wish there was an easy way to filter by "independent websites"
This side comment from the post is intriguing. Other than manual curation, I wonder if there is a way to identify commercial vs independent domains? This would make a really good entry point for a specialty search engine for indie sites only.
>It's fun to think about how to potentially automate this kind of tracking
I download my entire history regularly and use https://yacy.net/ to index at all. It's essentially like a local search engine. Also works on the local file system and across machines.
I have a bookmarklet that saves the current page in some kind of weblog (tumbleblog) where I can modify/tag/unlock them for (public) visibility on a webpage. Its pretty easy to save the current page with JS somewhere via bookmarklet.
I Print-to-PDF everything I've ever found interesting to spend longer than 2 minutes lingering on .. and as a result, I've got 20+ years of Internet articles to go through and read offline, any time.
Its very interesting to see the change of quality of technical writing over the last two decades. There's a definite, observable increase in click-bait style writing.
I have read them all. And I often go looking for articles in the archive. Its really quite handy to have every single interesting thing I've ever read, accessible this way. I suggest you try it out for a year and see for yourself!
My personal flow is that whatever looks like an interesting article but is longer than a 2-3min read goes to Pocket. From there I try to read and save whatever interesting bits in Joplin (it's pretty rare that I'd find a whole article so inspiring that I'd like to keep the whole text).
Downside of this is that I'm probably at around 30% rate of what I can treat this way (the highlighting / clipping is a bit more time intensive than I'd like and needs to be done at a PC, while for reading I prefer a Kobo e-reader).
Upside is that in Joplin I can keep track of no matter what text, including the discussions on HN / elsewhere.
Do you print to pdf this interaction or is it not an archive of interaction but more an archive of articles? I am assuming the latter because the former would be too hard.
I don't understand how one would print to pdf an interaction?
If I read something, and it holds my interest for 2 minutes or so, I print the article to PDF for archival purposes. I've done this now for 20+ years and have a huge history of things that have interested me over the years.
Its extraordinary to see the change in the web, even in just the last 5 years, in terms of click-bait techniques and content.
A lot of what I read, particularly on sites like HN and Reddit, are the comments section, which are usually more valuable than the article itself and where I have the most outsized reactions/learning/engagement.
So I was thinking aloud/asking if preserving this type of content is a) worthwhile, b) straightforward
Nope, I just print-to-PDF onto my Desktop and then dump all my .pdf files into a "PDFArchive" folder once or twice a week.
I've got over 70,000+ .pdf files in that archive now. Its very easy to query for data in this archive - ls is a surprisingly useful search tool when combined with a few |'s and grep's... and pdf2text is highly useful as well!
One of these days I'll get around to making some word cloud tools or so, maybe scanning the archive to find my favourite 3rd party sites, etc.
I have started using Obsidian (at work).
And I copy/paste into it any web content, text or image I find useful from the intranet, emails, meet.
I try my best to organise things. But for the most part, I use the search engine and the [autogenerated] links.
The only requirement when adding content is to figure out whether it should be added to an existing note or a new dedicated note should be created.
[btw, note nesting does exist in Obsidian]
With this simple workflow, you completely eliminate the notion of provenance of the knowledge.
The knowledge is here and up to your organisational habits.
After some time doing that, you end up with VERY dense notes (in term of knowledge/line ratio), and very few useless (distracting) content.
I've been logging all my web activities since 2018, it's been a great tool. On the server side, I filter out ad spam and other extraneous URLs, and then run a cronjob that converts all new HTML documents it sees to PDFs with wkhtmltopdf. It's been a great tool for finding stuff in those moments where I go "hm, I remember seeing something about this months ago..."
The PDFs are a couple of GB every year - I could limit that further by putting more aggressive filtering in place. There is also a rule that copies the contents of image and PDF URLs verbatim without converting them.
Over time I'm going to extend all of this to a more fully-fledged "sousveillance" dataset. I would love to add health and location data, but alas Apple is such a walled garden. But I did add a note-taking system, Zettelkasten-style.
The author doesn't appear to have documented the bookmarklet itself. If they are here or another person, can you suggest what it might look like to have a bookmarklet collect the url, page title, meta description and image, and then set window.location.href ?
I use such a bookmarklets to paste things into my own pastebin-like site. It logs the selected text, page title, and url. Clips are then grouped by url and domain.
I used to use a browser extension that would track every page I visited and index everything. This was a few years ago, and I can't remember what it was called or why I stopped using it. I think I was paying something like $5/mo. I'd like to find something like that again, it was really useful. I think it would be even more powerful with an AI agent that could organize all the information into categories, and answer questions like "What was the article I was reading last week about <x>?"
Is anyone building something like this? (It would be great if I could run something on my own server.)
Browsers track visits in their history, Firefox for example in a file called "places.sqlite" which you could copy and use as a base log.
If you don't mind a bit of tinkering and are a Linux user, you can start with a squid proxy on your server which will keep a log. That could be a Raspi at home. Your browser can connect to that server, either directly, or via SOCKS5 or even an ssh tunnel if the proxy is running on your (external) server.
These ssh tunnels are very flexible things. I use an ssh tunnel and SOCK5 on top of that so my network cannot be traced by my provider. The tunnel is set up in a terminal window like
ssh -D 8080 -l user proxy.mydomain.net sleep $[45*60]
and the browser (Firefox for example) is configured to use the proxy at "localhost:8080" which routes its network connection through the tunnel exiting at my server. The "sleep ..." is any command you want to run instead of a real login session, i.e. you can just drop it if you want. One can add a proxy into this chain. Or use "ssh -v -D ..." to get a log where you can extract your destinations.
The AI part would not trivial in general, and especially because of "content creators" and "SEE optimizers" ... and last but not least dynamic web pages.
That sounds like an interesting idea. I like the idea of capturing all the text content I view on the internet and building a full-text search index. Would also be fun to play with NLP and machine learning. I run Tailscale on all my devices, so it would still be easy to use it when I'm away from home. I already have a Linux server that I use for Home Assistant, so I could set it up on there.
I'm just not sure how this would work with SSL. I might have to install a self-signed SSL certificate on my devices, but it wouldn't work if a site uses certificate pinning. Is there an easy way around that? Maybe run Chromium with some CLI flags, or use a fork?
Hmm, you can terminate the SSL connections on a proxy. Some colleagues at work (but at a different location)maintain such a setup (i.e. I never set it up myself) but the relevant search terms are "reverse proxy" and "SSL termination" on that proxy. Only the proxy needs a proper certificate then, but that's where Let's Encrypt helps.
It doesn't do question answering right now, but it saves, indexes & categorises articles for you. The reader mode extension this is tied to is open-source, but the backend code not yet.
Here's the private beta link if you're interested: library.lindylearn.io/signup
I used to try bookmarking things using the built in browser bookmark manager and then later using Raindrop and even copying links into Obsidian but this wasn’t really all that effective. After watching the video I trialled DevonThink and was massively impressed. Now, every article I read that I find interesting I save as either a pdf or web archive so I can search and find it later. I also do the same for useful stack overflow posts so I know I’ll be able to find them if necessary. On top of this I bookmark all kinds of useful sites and categorise them in folders in their respective databases.
This allows me to keep Obsidian for just pure notes/writing. If I want to link between the two I can also use Hook to embed links between the two applications.
If I want to get proper reference formatting for something, I can open it from DevonThink in the browser and then save it to Zotero. Alternatively some people save everything to zotero instead of DevonThink and then index the folder using DevonThink so it is included in their search. Either approach works.
Highly recommend anyone with a Mac trying out the free trial of DevonThink, I think it’s like 100 hours of usage. Would dislike going back to living without it.
There is the server version which runs in the browser so in theory you could use that for any device and have your databases on something like Dropbox but I don’t know for certain, you’d have to double check with Devon Technologies.
I think an interesting angle would be a categorization by the author of what they found was useful/fluff/low quality. Would be a good way to figure out where you're wasting time vs getting value (of course sometime the point is to waste time...)
Automatic resolution followup would be interesting. If you read an article about a new research result, you should get an followup on how it came out, months or years later. If you read about an arrest, you should eventually get the case disposition. If you read about a new product, a followup when there are substantial reviews after customers have experience with it.
Could we feed the authors reading list into an AI and guess his OS, his Amazon history, or his likely topics of conversations at a dinner party? Really curious if you could mirror his decision making and tastes by what was in his reading list for some period of time.
The same is true for Brave bookmarks, absolutely terrible. Firefox was acceptable in that regard (although it didn't seem optimized for having a large number of bookmarks), however it doesn't seem to have any documentation on the actual export format either.
i've always wondered if you can just export firefox's history db...i'd then take a periodic dump of it and add it into another dedicated db for searching. god knows I spend way too much time looking up for things I had once read....
I automate a log of all the HTTP requests the computer makes, which naturally includes all the websites I visit.^1 I am not always using a browser to make HTTP requests, and for recreational web use I use a text-only one exclusively, so a "browser history" is not adequate.
In the loopback-bound forward proxy that handles all HTTP traffic from all applications, I add a line to include the request URL in an HTTP response header, called "url:" in this example. As such, it will appear in the log. For example, something like
What about POST data. That is captured into another HTTP response header, called "post-data:" in this example
http-request add-header post-data %[req.body] if { method POST }
To look at the POST data I might do something like
grep post-data: 1.log|cut -d' ' -f3-|less
1. I also use a system for searching the www or specific www sites from the command line. The search results URLs for each query are stored in simple HTML format similar to the above. One query per file. What's non-obvious is that each file can contain search results from different sources, sort of like the "meta-search engine" idea but more flexible. The simple HTML format contains the information necessary to continue searches, at any time, thus allowing a more diverse and greater number of search results to be retrieved. (Sadly, www search engines have been effectively limiting the number of search result URLs we can retrieve with Javascript and cookies disabled.) The command line program reads the information to continue a search from the simple HTML comments.
I use a service that will pull selected articles from my Pocket account and formats and prints them to a nice booklet that is sent to me one a month. I find this makes me more conscious when deciding whether to add an article to Pocket as I am now asking myself if I really want to read it later in the printed booklet (vs. just adding it to "the list" to remove it from the tab bar).
Yes, but this one is completely avoidable with zero effort. It's a personal decision to say "no, I need my articles written on dead trees and delivered to my doorstep".
Not everyone and everything are enjoyable on the screens, even if it's e-ink one.
All shops near me still has the 'dead trees' stand with a glossy maculature - and while people occasionally buy them - most of them goes to the landfill anyway. THAT is wasteful.
Started doing this with Raindrop recently. My 180+ open tabs in Firefox started giving me anxiety. After purging some and adding some to Raindrop, I'm now down to 120
I'm coming up on 3k articles read, probably most of it's all from hn. Jesus I have too much! I use my own app, leftwrite.io to keep track of everything I read and the notes I make. A retrospective might be fun though it'll make it very clear how much of nothing I do.
Thanks for sharing the `window.location.href`. I had attempted to do something like this in the past and gave up when I hit the HTTP issue you are referring to, specifically in the websites (mostly news sites) that I predominantly spent time with.
Zotero simplified a lot of my note taking / "information retaining" from stuff I bookmark / read. Much better than Obsidian etc.
> Add thing to Zotero, 99% of the time the metadata comes with it and is searchable already.
> To mark it as read, make a note for it (usually consists of main idea, good / bad ideas in the article, few sentences).
> Zotero entries without note are assumed unread.
for my diary / misc notes I use vscode with some markdown plugins, foam has like daily note functionality which is nice, add a new diary entry and add some tags ezpz
The articles I've read are recorded by the fact that they contain a note in the Zotero catalog. Articles I want to read are implicitly recorded by the fact that they're in the catalog without a note attached to them. In Zotero when I use its notes functionality I mostly copy-paste quotes from the source material as a highlight reel for myself.
Here's a picture to show: https://i.imgur.com/HvIxdI0.png, everything with '1' under the first column is "read" or at least skimmed, and everything without it is unread but waiting to be read.
Diary contains more of my own thoughts about the material if they arise, and other stream-of-conscious stuff. For organizing my diary I use just tags and timestamp as a filename.
The main thing I'd say is that you have to kind of test and fiddle around with this stuff, once you find something with next to 0 overhead (like my flow is for me) it becomes quite rewarding in how it helps you gather and retain information. I could never get into the whole Roampilled hyperdimensional graph productivity BassBoost thing myself.
I have not any tracking and my history/cookies is cleaned regularly. But my reading history is might be the boringest thing for analysis - everything about Lisp and some darknet activities. Former I anyway keep learning and collecting, and latter I really prefer to be forgotten.
I’ve made a Shortcut a while back that basically saves a webpage to pdf and adds tags. When I’ve read an article it get a new tag “already read” and that’s it.
Now the tags are an iOS/iPadOS/macOS thing sure, but the pdfs I can take with me to any platform.
I have different reading lists on hacker news, twitter, reddit and medium and because of this I never read anything that I don’t read directly… If you need to share between them you need some convenient app for your phone and computer.
It might not be the perfect solution for your case but for me it works:
I'm just using my email inbox as my reading list. I always forgot the things I bookmarked on Twitter so I built a side project to just send me my Twitter bookmarks to my email inbox once a week so I can sit down and go through them.
I've been happily using Instapaper for this for many years. I add interesting articles to it from all over and then read and highlight in the app/web app right away or later. If I read an article directly, I still tend to add it to Instapaper to keep track of it, espesially if I found it interesting.
Yes, I'm happy with the flow. In Firefox I have a button that adds the article with one click, and in Safari on iOS it's two clicks; share -> instapaper.
It's more like what-I-read-on-hacker-news type of lists.
If I did one there'd be one session with 30 wikipedia pages about some random Italian kings in the 1400s, endless amounts of random Russian weapon systems, and some clickbait I got baited into off Reddit/Twitter, because everyone loves a bit of outrage once in a while.
- Send to Feedbin (https://feedbin.com/blog/2019/08/20/save-webpages-to-read-la...)
- Never look at it again