More

bdhcuidbebe · 2026-02-25T01:27:37 1771982857

The problem with keeping a issue tracker decentralized in your git checkout is that it doesnt scale.

You’d need to commit and push, and everybody else working on the project need to pull in your commits in order to see the changes.

This makes working with feature branches break immediately, and it makes cooperation really hard in all kinds of ways.

It can work, as long as you dont use branches and as long as you are solo, I guess.

The project readme suggests it would be a good fit for small teams. I beg to differ.

nxnze · 2026-02-25T06:20:37 1772000437

Yup, it's not the greatest solution for teams as a central issue tracker.

The main reason I included the mention of small teams is that you can opt out of committing ./ghist, keeping it on your machine only and use it to track your own set of work across multiple agent sessions. It then becomes more of a disposable tool you can use to get big chunks of work done with persistent task memory.

Going with sqlite might not have been the best decision either as it's ultimately a binary file that can't be diff'd. Potentially JSON might have been a better solution for this.

bdhcuidbebe · 2026-02-26T01:55:46 1772070946

> Going with sqlite might not have been the best decision

Many have tried out this general idea. I myself evaluated git-bug for a few days in 2018 when it was a novel idea, but I ran into issues I tried to raise in my previous comment.

The data format you chose is not even the main issue here.

Binary data that keeps changing is generally always unfit for source control.

In your use case, you can solve that by committing sql dumps of the database in a text format.

bdhcuidbebe · 2026-02-25T01:11:11 1771981871

No, because they did not. Title is correct, it was a job not a tribe

bdhcuidbebe · 2026-02-21T02:40:37 1771641637

They are able to scrape paywalled sites at random, so im guessing a residential botnet is used.

khannn · 2026-02-22T15:42:56 1771774976

It's funny that residential VPN botnets aren't uncommon now. "Free VPN" if you allow your computer/phone to be an exit point.

pingou · 2026-02-21T09:16:57 1771665417

But how do they bypass the paywall? They can't just pretend to be Google by changing the user-agent, this wouldn't work all the time, as some websites also check IPs, and others don't even show the full content to Google.

They also cannot hijack data with a residential botnet or buy subscriptions themselves. Otherwise, the saved page would contain information about the logged-in user. It would be hard to remove this information, as the code changes all the time, and it would be easy for the website owner to add an invisible element that identifies the user. I suppose they could have different subscriptions and remove everything that isn't identical between the two, but that wouldn't be foolproof.

wbmva · 2026-02-21T19:13:24 1771701204

On the network layer, I don't know. But on the WWW layer, archive.today operates accounts that are used to log into websites when they are snapshotted. IIRC, the archive.today manipulates the snapshots to hide the fact that someone is logged in, but sometimes fails miserably:

https://megalodon.jp/2026-0221-0304-51/https://d914s229qk4kj...

https://archive.is/Y7z4E

The second shows volth's Github notifications. Volth was a major nix-pkgs contributor, but his Github account disappeared.

https://github.com/orgs/community/discussions/58164

seanhly · 2026-02-21T10:09:53 1771668593

There are some pretty robust browser addons for bypassing article paywalls, notably https://gitflic.ru/project/magnolia1234/bypass-paywalls-fire...

This particular addon is blocked on most western git servers, but can still be installed from Russian git servers. It includes custom paywall-bypassing code for pretty much every news websites you could reasonably imagine, or at least those sites that use conditional paywalls (paywalls for humans, no paywalls for big search engines). It won't work on sites like Substack that use proper authenticated content pages, but these sorts of pages don't get picked up by archive.today either.

My guess would be that archive.today loads such an addon with its headless browser and thus bypasses paywalls that way. Even if publishers find a way to detect headless browsers, crawlers can also be written to operate with traditional web browsers where lots of anti-paywall addons can be installed.

wuschel · 2026-02-21T13:50:21 1771681821

Wow, did not know about the regional blocking of git servers! Makes me wonder what else is kept from the western audience, and for what reason this blocking is happening.

Thanks for sketching out their approach and for the URI.

pingou · 2026-02-21T10:51:03 1771671063

But don't news websites check for ip addresses to make sure they really are from Google bots?

seanhly · 2026-02-21T11:51:29 1771674689

Most of them don’t check the IP, it would seem. Google acquires new IPs all the time, plus there are a lot of other search systems that news publishers don’t want to accidentally miss out on. It’s mostly just client side JS hiding the content after a time delay or other techniques like that. I think the proportion of the population using these addons is so low, it would cost more in lost SEO for news publishers to restrict crawling to a subset of IPs.

expedition32 · 2026-02-21T11:11:56 1771672316

I use this add on. It does get blocked sometimes but they update the rules every couple of weeks.

rkagerer · 2026-02-21T17:47:53 1771696073

I thought saved pages sometimes do contain users' IP's?

https://www.reddit.com/r/Advice/comments/5rbla4/comment/dd5x...

The way I (loosely) understand it, when you archive a page they send your IP in the X-Forwarded-For header. Some paywall operators render that into the page content served up, which then causes it to be visible to anyone who clicks your archived link and Views Source.

bdhcuidbebe · 2026-02-21T10:49:08 1771670948

> But how do they bypass the paywall?

I’m guessing by using a residential botnet and using existing credentials by unknowingly ”victims” by automating their browsers.

> Otherwise, the saved page would contain information about the logged-in user.

If you read this article, theres plenty of evidence they are manipulating the scraped data.

But I’m just speculating here…

pingou · 2026-02-21T10:57:51 1771671471

But in the article they talk about manipulating users devices to do a DDOS, not scrape websites. And the user going to the archive website is probably not gonna have a subscription, and anyway I'm not sure that simply visiting archive.today will make it able to exfiltrate much information from any other third party website since cookies will not be shared.

I guess if they can control a residential botnet more extensively they would be able to do that, but it would still be very difficult to remove login information from the page, the fact that they manipulated the scraped data for totally unrelated reasons a few times proves nothing in my opinion.

notpushkin · 2026-02-21T12:57:16 1771678636

They do remove the login information for their own accoubts (e.g. the one they use for LinkedIn sign-up wall). Their implementation is not perfect, though, which is how the aliases were leaked in the first place.

bdhcuidbebe · 2026-02-12T06:35:01 1770878101

AFK vibe coded while k-holed at a virtual influencer conference

bdhcuidbebe · 2026-01-25T19:18:03 1769368683

More resources were put into gaming api’s by Valve, that is all.

bdhcuidbebe · 2026-01-24T17:59:56 1769277596

But, the pile of reasons for not running windows is already through the roof…

bdhcuidbebe · 2026-01-11T22:42:27 1768171347

Last week, I asked Gemini to give me episode names and air dates for a tv show, and it proceeded to fabricate two seasons worth of titles, not a single name or air date was correct. The episode names was even listed on (swedish) wikipedia, so it should been in their training data.

bdhcuidbebe · 2025-12-31T15:07:31 1767193651

> using the local model (Ollama) is 'free' in terms of watts since my laptop is on anyway

Now that’s a cursed take on power efficency

ImPrajyoth · 2025-12-31T15:47:38 1767196058

efficiency is just a mindset. if i save 3 seconds of my own attention by burning 300 watts of gpu, the math works out in my favor!

abeyer · 2026-01-04T09:25:01 1767518701

"works out in my favor" is a pretty poor metric.

If I burn a billion tons of someone else's coal to make myself a paperclip (and don't have to breathe the outputs) it works out in my favor too.

bdhcuidbebe · 2025-12-25T00:32:53 1766622773

The server is not usually a remote machine. The server is the app accepting remote connections.

This has been true for decades.

https://en.wikipedia.org/wiki/Server_(computing)

quesera · 2025-12-25T00:42:02 1766623322

Please don't imagine that I don't fully understand this.

Nevertheless, X11 "server" and "client" have confused very smart and highly technical people. I have had the entertainment of explaining it dozens of times, though rarely recently.

And honestly, still, a server is usually a remote machine in all common usage. When "the server's down", it is usually not a problem on your local machine.

bdhcuidbebe · 2025-12-25T00:31:13 1766622673

Be the change you want to see.

Also happy winter solstice.