More

_bramses · 2024-08-25T13:36:10 1724592970

A few months back, I posted an essay here about how I think we are entering an age where the personal library will become an asset class unto itself [1]. This is due to a combination of the advent of semantic search and the revival of personal knowledge management in the deluge of our age.

As such, I’ve been working on the software to bring this vision to life, which is called Your Commonbase (a portmanteau of Commonplace Book and Vector Database).

In short, the purpose of the work is to create a data structure that works the way humans store, retrieve, and share information. By making these three elements as close to zero stress as possible, you catalyze creativity through remixing and augmentation of memories. My hypothesis is a lifetime building a Commonbase creates an idiosyncratic system, filled with the interpretations of an individual or a group. This individualized structure then creates demand that others want. I.e, a curation of all of the books you have read, organized by the marginalia you have added to them. This is a system people would pay for, and also a system that becomes more valuable over time.

I’ve been “working in public” by posting updates on my site [2], and am just beginning a small waitlist alpha testing phase (email me if you want in!)

[1] - https://news.ycombinator.com/item?id=40192359

[2] - https://www.bramadams.dev/

BigParm · 2024-08-28T01:14:37 1724807677

I've read a ton of your writing on this topic. There is still much more, but I'm tired. You described the problem well, and if your software alleviates it, then I see the value.

I'd like to try the software if it'd available please. But I'm not sure where it is or if it's out yet. Is it the Obsidian template you linked here, or is it a separate software called Your Commonbase? Because the Obsidian vault has a different name.

I have to say that the irony is hilarious that I can't find this information in the (well written) sea of information you posted. But then of course I'm not using your solution yet!

tenahu · 2024-08-25T17:46:28 1724607988

This sounds interesting, but I do not totally understand your description. Could you explain further what you are building?

_bramses · 2024-08-25T19:33:50 1724614430

Sure!

The problem with existing knowledge management systems is that they all eventually become victim to scale the operator can no longer manage.

Consider two scenarios:

Scenario 1:

You have a chaotic system of notes, dispersed throughout random pieces of paper and digital notes on your phone and computer. You have a relatively easy go of it when saving new notes (its as simple as pressing "New Note" or pulling out a new sheet of paper), but as you add more notes to your system it becomes harder and harder to find a particular piece of information. The cost of adding stays the same, but searching goes up and up. Scenario 1 causes us to eventually succumb to chaos.

Scenario 2:

You have an extremely rigid system of tag management, headings, sub-headings, sub-subheadings and sub-sub-subheadings. This taxonomy makes it easier to find information...at first. The problem with these systems is that they require a ton of manual maintenance, and they also make it easy to find certain "classes" of information, but fail at geo locating others. Much more perniciously, this scenario eventually stifles creativity as it falls prey to too much order. The cost of searching stays roughly the same, but the cost of insertion goes up and up.

(I am personally guilty of creating a system like this btw [1][2]).

Both of these systems eventually begin to become inert holders of information, as the processor (us), begin to fear them and stop working with them.

IMO, the closest technology to managing human information well in my opinion is well over two thousand years old, the commonplace book [3]. Simply put, a commonplace book is extremely resilient to chaos due to its centrality of information, but even people like John Locke had to create indexes to fully utilize them [4].

This changed recently with the advent of vector databases[5]. It turns out that commonplace book entries are the perfect form factor to benefit from an address in vector space, since entries are atomic. In simpler terms, the vector processing layer handles the order, allowing our system to "live" and assign headings/tags/etc. as it evolves. Vector databases love commonplace books as well, because many vector solutions have way too much noise as they chunk and store useless information at quite a disappointing ratio.

My system differs from current offerings because it makes no attempt to automize parts that are meant for the human, and makes no attempt at making humans do the work computers should be doing. Ergo, creating a type of symbiotic relationship.

Finally, a note on why I use the term "asset". An asset should become more valuable over time, and particularly, each individual component of the overall asset class should be worth more (e.g. $1 in a bank account of $20 is inherently less valuable than $1 in a $20MM bank account, because it grows slower). So in our scenarios above, the transmutation of information to knowledge peaks out at a logarithmic curve, subject to the scale issues I mentioned before. Old entries appear less frequently in even the most ordered systems, and when they do, it is only in one particular context. My system stores time of entry in the metadata, but since I use vector addresses, the information is accessible in many different ways (dog can be found when query == canine, fido, perro, mans best friend, etc...). An informational asset should scale linearly, and each action of create/read/update/delete should improve the health of the overall system.

There's much, much, much more I could say here, but I'll stop for now :)

[1] - https://github.com/bramses/bramses-highly-opinionated-vault-...

[2] - https://news.ycombinator.com/item?id=34034414

[3] - https://en.wikipedia.org/wiki/Commonplace_book

[4] - https://publicdomainreview.org/collection/john-lockes-method...

[5] - https://openai.com/index/introducing-text-and-code-embedding...

vmt-man · 2024-08-31T22:47:21 1725144441

Tags are useless, encoded titles are better.

I'm using Zettelkasten with Obsidian and completely satisfied.

_bramses · 2024-06-22T06:48:40 1719038920

Makes you wonder as well if the expressed genetic traits we can’t see are more are less different than the ones we can.

For example, does evolution have any pressure to produce those who think linguistically, vs healthy hair and skin?

_bramses · on April 29, 2024

> There is a rush in public to condense and summarize many authoritative publications to find patterns, or to replace a human expert with automated results.. yet that is fundamentally different than taking multiple incomplete perspectives to add to a human library-owners knowledge and investigations. It is subtle to speak it but not subtle in its implications.. taking "data as facts" and condensing them or reordering them or rewriting an output based on them, using automation, is different than a human mind taking in many inputs for human mind knowledge and enabling new outputs from a human author.

You nailed it! Thanks for noticing the divergence!

walterbell · on April 29, 2024

There's lots of interesting work that came out of BCL in 1960s, https://en.wikipedia.org/wiki/Biological_Computer_Laboratory

> The focus of research at BCL was systems theory and specifically the area of self-organizing systems, bionics, and bio-inspired computing; that is, analyzing, formalizing, and implementing biological processes using computers. BCL was inspired by the ideas of Warren McCulloch and the Macy Conferences, as well as many other thinkers in the field of cybernetics.

On cybernetics, https://www.pangaro.com/definition-cybernetics.html

> Artificial Intelligence (AI) grew from a desire to make computers smart, whether smart like humans or just smart in some other way. Cybernetics grew from a desire to understand and build systems that can achieve goals.. it connects control (actions taken in hope of achieving goals) with communication (connection and information flow between the actor and the environment).. Later, Gordon Pask offered conversation as the core interaction of systems that have goals.

_bramses · on April 29, 2024

> Would an LLM-driven "Personal Library" require manually annotated textual interpretation of each curated item, or could it derive personal interpretations from user history and the uniqueness of curated items/sets?

I’ve personally found that tagging is less robust than LLM embeddings (mainly due to dimensionality), but human appended thoughts about a source — also embedded — serve even better as tags.

Example: “this is a quote about dinosaurs…” (Old way of doing things) Tags: dinosaurs, jurassic, history Query: “dinosaurs” > results = 1…

(New way of doing things) Embedded Quote: [0.182…] User Added Thought: “this dinosaur reminds me of a time i went to six flags with my cousins and…” Embedded User Added Thought: [0.284…]

Query: “dinosaurs” > results = 2 (indexes = sources, thoughts)

The "thoughts" index can do a second layer cosine similarity search and serve as a tag on its own to fetch similar concepts. Basically a tree search created by similarity from user input/feedback loops.

_bramses · on April 18, 2024

neat! one thing i’d really love tooling for: supporting multi user apps where each has their own siloed data and embeddings. i find myself having to set up databases from scratch for all my clients, which results in a lot of repetitive work. i’d love to have the ability one day to easily add users to the same db and let them get to embedding without having to have any knowledge going in

kiwicopple · on April 18, 2024

This is possible in supabase. You can store all the data in a table and restrict access with Row Level Security

You also have various ways to separate the data for indexes/performance

- use metadata filtering first (eg: filter by customer ID prior to running a semantic search). This is fast in postgres since its a relational DB

- pgvector supports partial indexes - create one per customer based on a customer ID column

- use table partitions

- use Foreign Data Wrappers (more involved but scales horizontally)

_bramses · on March 22, 2024

Gonna drop my own link here, because I really think the UX I’m working on is truly novel. Inspired by the commonplace book format, I take highlights from Kindle and embed them in a DB [1]. From there I build (multiple) downstream apps but the central one, Commonplace Bot [2] is a bot that serves as a retrieval and transformer for said highlights. It has changed the way I read books. I now get to link ideas from books I read in 2018 to books I read last week. I don’t need to always have a query either, as I added a hypothetical question as an entry point allowing for the UX of finding an idea to be as simple as typing “wander”. Finally, since quotes are dense, short, and generally context free, I enable a bunch of transformations like Anki quizzes, and art from quotes, and using the quote itself as a centroid to search its neighbors, etc.

[1] - https://github.com/bramses/quoordinates [2] - https://github.com/bramses/commonplace-bot

rupi · on March 22, 2024

I love this. I have my commonplace book in Roam Research. Search in Roam is not perfect and I have wondered lately if there was a way to get all of the content into a graph DB and then query using LLMs. But I haven't had time to tinker with it - I am sure open source libraries exist that do exactly this.

Can your library take all highlights from Readwise or just Kindle? I use Readwise Reader quite a bit and will love something that takes everything I save + all highlights + other places (Roam Research, Email, Calendar) etc. and I can just ask it questions.

_bramses · on March 22, 2024

You definitely could! Funnily enough, I have a function named "justBooks()" [1] that filters the Readwise export to just book type tags, but you could use the entire export, or whatever upstream method you want. I think much like journaling, every one's use case will be catered to their own tasks/quotes/ideas, but allow me to share centralized advice. You'll definitely need: 1) a database that supports vectors, I use Postgres 2) a low friction way to get your "new" highlights from your reading practice, I use Readwise 3) an llm to "cache" transformations [2]. This transformation does an insane amount of work, and takes it to the next level in terms of utility, I wouldn't skip it.

[1] - https://github.com/bramses/quoordinates/blob/1b9d1fadaded98b... [2] - https://github.com/bramses/quoordinates/blob/1b9d1fadaded98b...

aprilthird2021 · on March 22, 2024

This is really cool! I could see it being excellent for anyone who writes or gives speeches very often, a great way to quickly access the knowledge one builds up over a lifetime of reading. Love it!

_bramses · on Feb 15, 2024

I do. My hypothesis is that there isn't really good bokeh yet in the videos, and our brains get motion sick trying to decide what to focus on. I.e. too much movement and *too much detail* spread out throughout the frame. Add motion to that and you have a recipe for nausea (at least for now)

throwanem · on Feb 15, 2024

You can shoot with high depth of field and not cause motion sickness. Aerial videography does that every day, and it's no more difficult in general to parse than looking out an airliner window or at a distant horizon would be.

I suspect GP is closer to on the money here, in suspecting the issue lies with a semblance of movement that isn't like what we see when we look at something a long way away.

I didn't notice such an effect myself, but I also haven't yet inspected the videos in much detail, so I doubt I'd have noticed it in any case.

_bramses · on Dec 27, 2023

I really appreciate how Bruce is approaching the reality of how the idea of open source played out against how technology has found itself used in the real world. I'm optimistic for a future where open source is less of a religion or a marketing angle [1], and more beneficial for those who share their knowledge for the world to use and receive fair compensation.

[1] - https://www.bramadams.dev/issue-39/

gumballindie · on Dec 27, 2023

If anything it would appear that compensation for sharing software, and knowledge, is threatened by those in ai whose only way to monetise is leveraging free labor. One way to prevent that is to make such property more scarce. People are already pulling the plug on sharing knowledge due to it.

layer8 · on Dec 27, 2023

Given the AI copyright debate also currently on the front page, I’m a bit less optimistic. We might end up with a lot of software not using open source directly, but having AI rewrite a roughly similar version just different enough to barely not count as plagiarism.

_bramses · on Dec 18, 2023

I’m not sure I catch your drift here. As I define it, virtue signaling is the act of projecting your beliefs (or rather, the performative aspect of your beliefs) in a public space. So, yeah, when people place the placard of “open source” all over their projects, they may as well be saying “Free Range, Organic”. It’s a label that signals to others that care about that label.

Whether it’s done under the auspices of Twitter Spaces or VCs promoting certain projects over others implicitly by kingmaking projects with massive seed rounds, the result is the same.

This is why I say in the post that nuance is important.

_bramses · on Dec 14, 2023

I think a lot of these comments will highlight the lower level parts of ML, but what ML needs right now in my opinion is really smart people at the implementation level. As an analogy, there are way less “frontend” ML practitioners than “backend” ones.

Leveraging existing LLM technologies and putting them in software where regular people can use them and have a great experience is important, necessary work. When I studied CS in college the data structure kids were the “cool kids”, but I don’t think that’s the case in ML.

The daily practice is to sketch applications, configure prompts and function calls, learn to market what you create, and try to create zero to one type tools. Here’s two examples I made, one where I took the commonplace book technique of the era of Aristotle and put it in our modern embeddings era [1] and one where I really pushed to understand the pure MD spec and integrate streaming generative models into it [2]

[1] - https://github.com/bramses/commonplace-bot

[2] - https://github.com/bramses/chatgpt-md