Hacker News new | past | comments | ask | show | jobs | submit | tomd's comments login

It's there now


> I'm unsure whether SQLite's extension interface is flexible enough to support this.

I think it is, if I've understood your requirements correctly. e.g. from the datasette-faiss docs:

  with related as (
    select value from json_each(
      faiss_search(
        'intranet',
        'embeddings',
        (select embedding from embeddings where id = :id),
        5
      )
    )
  )
  select id, title from articles, related
  where id = value


You might be interested in https://datasette.io/plugins/datasette-faiss, which I'm using alongside openai-to-sqlite for similarity search of embeddings, following @simonw's excellent instructions at https://simonwillison.net/2023/Jan/13/semantic-search-answer...


Thanks, but the index being in-memory makes it unsuitable for large data sets :/


There is a way of running disk-backed FAISS indexed that don't all fit in memory but I've not quite figured out how to do that yet: https://github.com/facebookresearch/faiss/issues/2675


OpenSearch K-NN plugin supports FAISS and it's disk based:

https://opensearch.org/docs/latest/search-plugins/knn/index/


OpenSearch looks like the best so far, all my requirements combined!


Can you say more? Usually projects that gravitate to SQLlite are not those that require massive scale and a FAISS index of a few GB covers a lot of documents.


My dataset is going to be around 10M documents. With OpenAI embeddings, that will be around 62GB. AFAIK SQLite should be able to handle that size, but I haven't tried.

This is not going to be my primary DB. I would update this maybe once a day and the update doesn't have to be super fast.


you might check out some vector databases:

https://milvus.io/

AND

pinecone.io

there are others too


> It takes a while to configure a persistent database, redis, and storage on Heroku

Really? Have you actually tried this? In my experience Heroku couldn't make it any easier. You can provision a database in 60 seconds, from a Deploy to Heroku button in your Github README, or in the lovely dashboard, or on the CLI, or with two lines in heroku.yml.

We're a heavy Gitlab user and I'm a big fan of Gitlab's typically transparent communications style, but this article reflects really badly on you. I think you should take it down.


Daniele Procida - the author of this framework - is giving a talk about it at the start of Wagtail's documentation sprint on Thursday:

https://wagtail.io/docs-sprint-details

Do join if you're interested, even if it's just to hear Daniele's talk.


> there is no easy way to manage secrets with Cloud Run

See https://cloud.google.com/secret-manager/

I recommend this community-maintained FAQ on Cloud Run:

https://github.com/ahmetb/cloud-run-faq


Wagtail - https://wagtail.io - is the most popular Python CMS. It runs sites for NASA, Google, Mozilla, NHS.uk. It's open source and under very active development.


I run Torchbox, a small, private tech agency in the UK. We were commissioned by NHS Digital (the state sector organisation behind these tools) to support two related projects: moving NHS.uk to https://github.com/wagtail/wagtail and setting up https://beta.nhs.uk/service-manual/ on Wagtail.

NHS Digital's procurement processes are tight and focused. They have an expert in-house team of developers and delivery managers, and they only used us on where we could accelerate the project.

As a business owner, I would have liked a bigger contract. As a tax-payer and frequent user of the NHS, I'm very happy with their efficiency.


The impression I get is that the NHS is pretty efficient budget wise but with a budget so large it's easy to make the inefficiencies sound bad "NHS wastes millions on Foo" sounds much worse than "NHS wastes 0.001% of it's budget on Foo".

For people who don't know the NHS budget is approx £130bn (~$165bn US), they are a huge organisation that provides healthcare to 66 million people.

It's impressive.


> 'probably smart', 'primarily a salesman'

"A child prodigy in chess, Hassabis reached master standard at the age of 13"

"graduating in 1997 with a Double First[13] from the University of Cambridge"

"obtain[ed] his PhD in cognitive neuroscience from University College London"

"[his] theoretical account of the episodic memory system [...] was listed in the top 10 scientific breakthroughs of the year in any field by the journal Science"

https://en.wikipedia.org/wiki/Demis_Hassabis


But that does not change the fact that his primary role at the moment is selling his research institute to potential funders.


Sure, PhD in cognitive neuroscience is OK, but Demis' most impressive feat is winning the Mind Sport Olympiad a record 5 times.


Not Flask, but still Python: Wagtail - https://github.com/wagtail/wagtail - is a popular headless CMS option. Michael Harrison from rice.edu / Openstax gave a great talk about this at last week's Wagtail Space:

https://docs.google.com/presentation/d/1ZYMogOeXKCCmr7hDZnzx...

https://www.youtube.com/watch?v=HZT14u6WwdY


big thanks. Actually I worked for Openstax (Connexions) in the past but I think I did not met Michael Harrison at my time :)) I will definitely give it a testrun!


Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: