Show HN: I built a full-text search for your browsing history

KomoD · 2024-07-04T22:38:21 1720132701

Gives me really bad vibes, sending all your browsing history, page info, etc. to some server, including screenshots of your tabs.

You also say it doesn't save in incognito but I can't find anything in the source code to support that claim.

I see a bunch of other red flags too, like no obvious monetization, the privacy policy saying updated in 2021, but then a little bit further down it says updated in 2019 (neither makes sense as the domain was registered in 2023), privacy policy sometimes says Nision Research LLC and sometimes Nision Research Kft, the chrome web store page has an email for "gethaystack.com" but the privacy policy says info@browspilot.com (and that gethaystack site has a policy saying "info@localhost"), both of the reviews on the chrome web store are by people clearly affiliated with browspilot, the FAQ says you can't delete things from your history, privacy policy doesn't say where your data gets sent (i see mentions of firestore, firebase in the code, but it does not appear in the privacy policy)

Also this claim on your home page "Your data will only be read and used by you." doesn't align with what your privacy policy says.

pedalpete · 2024-07-04T23:24:17 1720135457

This was my initial concern. Though this is something I want, and I'm surprised isn't built into modern browsers. I'm less interested in having a cache of images, but useful search of my history, which I think could be stored locally, would be helpful. We don't need the search to be particularly clever I wouldn't think. Even straight string search of cached text from the webpages I've viewed would be valuable. I don't need all the images, video, javascript, etc cached.

tarasglek · 2024-07-05T07:18:41 1720163921

I proposed this at firefox when I was there, was deemed as too niche a feature...I'm still sad that I can't have this feature

yjftsjthsd-h · 2024-07-05T06:20:52 1720160452

> You also say it doesn't save in incognito but I can't find anything in the source code to support that claim.

Doesn't Chrome default to disabling extensions in incognito? So the extension wouldn't actually have to do anything itself for that to be true.

NVI · 2024-07-04T22:48:35 1720133315

Why do I need to log in with Google to use it? I'm also experiencing a bug where after logging in, I see the same login popup over and over again.

mef · 2024-07-05T01:55:14 1720144514

i too had this issue, and it persisted even after i uninstalled the extension. i had to restart the browser entirely

peterpelles · 2024-07-03T11:44:39 1720007079

I’d love to hear your feedback – please share your thoughts and suggestions in the comments!

dotcoma · 2024-07-04T00:53:53 1720054433

Was browsepilot.com not available ?

(brows looks weird to me)

KomoD · 2024-07-04T22:44:13 1720133053

They have both browse and brows.

dotcoma · 2024-07-05T14:07:38 1720188458

So... do you know why they chose BrowsPilot and not BrowsePilot ?

KomoD · 2024-07-05T23:46:07 1720223167

No clue, I too prefer Browsepilot over Browspilot

bradrn · 2024-07-04T22:53:47 1720133627

This reminds me… some time ago I made my own Firefox extension to do full-text search of all my webpages. It’s in three parts: a server running in the background to interface with an SQLite database, a minimal extension to send text to that server, and a little GUI to query the database.

Unfortunately, all this makes it an utter pain to set up. It’s also somewhat specialised to my own very minimal needs. When I’ve mentioned it in the past, people have suggested open-sourcing it, but for these reasons I’ve resisted it. This post now makes me wonder if I should look into ways to improve it…

KomoD · 2024-07-04T22:58:45 1720133925

> When I’ve mentioned it in the past, people have suggested open-sourcing it, but for these reasons I’ve resisted it.

Could just open-source it "as is", there's probably some people that would be interested in just messing around with it or using it as a base

bradrn · 2024-07-04T23:31:06 1720135866

Yeah, you’re probably right. They’d have to be familiar with the rather eclectic mix of languages I used (Haskell, C++ and JavaScript), but then again that’s no reason not to publish it. To be honest, I’m not quite sure why I haven’t just put it online… sheer laziness, I guess.

beeboobaa3 · 2024-07-04T22:38:38 1720132718

Where is my data stored?

Leftium · 2024-07-03T15:16:16 1720019776

One of the features I wish Kagi had was the ability to search through my previous search queries. More details here: https://kagifeedback.org/d/4065-query-personal-search-histor...

Maybe Browspilot could fill this gap!

One thing I noticed is my search history is like a zero-effort personal journal. It gave me a detailed glimpse of what I was doing/thinking on a certain day from several years ago.

peterpelles · 2024-07-03T17:56:40 1720029400

I read your feature request on Kagifeedback. With our tool, Browspilot, you can currently recall pages you have visited based on keyword matches. Given that you can also search the body of a page and sometimes the comments too, it's already quite useful, and I'm confident you will be able to find most of the things you're looking for most of the time. It was almost surprising to us too, how easily one can actually find stuff - given you are searching in a limited dataset, which is your own search history as opposed to everything like you do in google - just by typing in words that appear somewhere on the page you are looking for, as opposed to having to click a bunch of times and navigate through apps, messages, or emails to find a link again.

However, I believe that once we introduce the advanced vector search, which we are already testing in our beta version, you should be able to find the page you are looking for in Browspilot with absolute certainty just by typing words related in meaning into the search box, so you won't even need to remember your exact search queries.

We will also be adding image search capabilities soon.

peterpelles · 2024-07-03T16:56:22 1720025782

Thanks for this. Very useful!

PostOnce · 2024-07-05T07:33:23 1720164803

browsers stagnate by their monoculture, this feature should've been part of a browser (and local to the machine) 20 years ago and here we are adding WebMIDI and webUSB support when we cant even find shit we looked at 3 days ago.

future10se · 2024-07-05T02:15:51 1720145751

I've been looking for something like this, but as a desktop app that runs locally. So far I've only found two:

1. HistoryHound - https://www.stclairsoft.com/HistoryHound/index.html

2. BrowserParrot - https://www.browserparrot.com/ (sadly seems abandoned)

Wish I could find something like these but open-source. Both of them parse your browser history, fetch the pages, and build their own index. Would be a "safer" and more space/cpu-efficient alternative to apps like Windows Recall and Rewind.ai.

bbkane · 2024-07-05T02:21:39 1720146099

There's https://github.com/go-shiori/shiori?tab=readme-ov-file . It works on bookmarks and uses SQLite to enable full text search. It's also a CLI so I thibk you can write a script that parses your history file and loads it into this

hamsterbase · 2024-07-05T04:40:41 1720154441

You could try hamsterbase. All functions are offline and data is stored locally.

If you need to save all the pages you've seen, you can use singlefile, an open source plugin that works directly with hamsterbase.

janice1999 · 2024-07-04T21:57:49 1720130269

How do you plan to make money? The obvious answer would be selling people's data. What is your alternative?

purple-leafy · 2024-07-04T23:29:31 1720135771

As someone who builds chrome extensions, and is very familiar with monetisation of most extensions…

Ding ding ding! All your data are belong to us.

I explicitly never save user data (apart from auth and subscription status) in my extensions, and the only call back “home” is to check whether the authenticated user has an active subscription.

I’m also going to make my more powerful chrome extensions “source available” so people can see exactly what it does with your data, and on your machine. Not “open source” because I don’t want contributions

yjftsjthsd-h · 2024-07-05T06:29:44 1720160984

> Not “open source” because I don’t want contributions

Minor nit: You can make something FOSS by publishing the code under a FOSS license. You don't have to accept PRs, you don't have to take bug reports or feature requests, you don't have to foster a community. Open Source can be as simple as "here is a tarball of source code that you can use", full stop. (As an extreme case in multiple senses, sqlite famously is public domain, and also generally doesn't take any contributions - https://www.sqlite.org/copyright.html )

Of course, you are fully entitled to go Source-available too, and if you want to facilitate audits without actually giving anyone else the right to use your code then that's the way to go. I just want to point out that there are options between "not FOSS" and "community-centric development".

beeboobaa3 · 2024-07-05T14:21:36 1720189296

> I’m also going to make my more powerful chrome extensions “source available” so people can see exactly what it does with your data, and on your machine. Not “open source” because I don’t want contributions

You should also encourage/teach users how to check the source of extensions they've installed locally. It's actually pretty easy. If you're not obfuscating, you may not even need to make the source explicitly available.

https://gist.github.com/paulirish/78d6c1406c901be02c2d

iansinnott · 2024-07-05T00:39:23 1720139963

[flagged]

bcjordan · 2024-07-05T01:47:55 1720144075

Does this have semantic search of some form? May be possible to implement all client-side with local browser models soon

iansinnott · 2024-07-05T04:40:08 1720154408

it does not. i did look into it though [1] and at the time didn't find a good client side vector search lib. i wanted to avoid in-memory vector search since the size of the data can be significant depending on browsing habits. It is definitely possible though. I got a proof of concept working with victor [2] and client-side embeddings but it wasn't good enough IMO to ship.

[1]: https://github.com/iansinnott/full-text-tabs-forever/issues/... [2]: https://github.com/not-pizza/victor

alexliu518 · 2024-07-05T06:37:09 1720161429

Browspilot is a neat tool built by Peter and his team to help you find anything you've seen online with just a clue or by scrolling through your past activity. It's super handy for pulling up frequently used pages or digging up old stuff without keeping a bunch of tabs open.

Whether you're a student or a busy professional, just type in a bit of what you remember, and it’s there. Plus, exciting features are on the way, like searching across different apps and finding things based on meaning with advanced tech.

Overall, Browspilot makes finding online content a breeze!