More

PossiblyKyle · on Nov 26, 2023

I fail to understand why I should use it over a different embedded vector DB like LanceDB or Chroma. Both are written in more performant languages, have a simple API with a lot of integrations and power if one needs it

dmezzetti · on Nov 26, 2023

To be fair, Chroma is also written in Python. And while LanceDB and others are written in Rust, that doesn't automatically give it super powers.

6r17 · on Nov 26, 2023

Python programmer for 15 years and i picked up rust to write an oAuth gateway not long ago ; i wrote it in python beforehand - rust DOES give you superpowers ; especially if you compare it to something like python that isn't nowhere as fast and has no typing

dmezzetti · on Nov 26, 2023

There are plenty of examples of Python libraries that can be performant such as NumPy and PyTorch (which both rely on C/C++). Some libraries such as Hugging Face's tokenizers even use Rust.

I referenced this article below but will reference it again here too. https://neuml.hashnode.dev/building-an-efficient-sparse-keyw....

You can write performant code in any language if you try.

aldanor · on Nov 26, 2023

NumPy is a C library with Python frontend, moreover lots of functionality based on other existing C libraries like blas etc.

PyTorch, quoting themselves, is a Python binding into a monolithic C++ framework; also optionally depending on existing libs like mkl etc.

> You can write performant code in any language if you try.

Unfortunately, only to a certain extent. Sure, if you just need to multiply a handful of matrices and you want your blas ops to be blas'ed where the sheer size of data outweighs any of your actual code, it doesn't really matter. Once you need to implement lower-level logic, ie traversing and processing the data in some custom way, especially without eating extra memory, you're out of luck with Python/numpy and the rest.

benrutter · on Nov 26, 2023

> NumPy is a C library with Python frontend

I guess this is a pretty legitimate take, but in that case VectorDB looks like (from the got repo) it makes huge use of libraries like pytorch and numpy.

If numpy is fast but "doesn't count" because the operations aren't happening in python, then I guess VectorDB isn't in python either by that logic?

On the other hand, if it is in Python despite shipping operations out to C/C++ code, then I guess numpy shows that can be an effective approach?

bee_rider · on Nov 26, 2023

BLAS can be implemented in any language. In terms of LOC, most BLAS might be C libraries, but the best open source BLAS, BLIS, is totally structured around the idea of writing custom, likely assembly, kernels for a platform. So, FLOPs-wise it is probably more accurate to call it an assembly library.

LAPACK and other ancillary stuff could be Fortran or C.

Anyway, every language calls out to functions and runtimes, and compiles (or jits or whatever) down to lower level languages. I think it is just not that productive to attribute performance to particular languages. Numpy calls BLAS and LAPACK code, sure, but the flexibility of Python also provides a lot of value.

How does Numba fit into this hierarchy?

mgl · on Nov 26, 2023

This is unfortunately not correct once you start pushing the boundaries requiring careful allocation of memory, CPU cache and COU itself, see this table:

https://stratoflow.com/efficient-and-environment-friendly-pr...

_a_a_a_ · on Nov 26, 2023

I don't accept that. In the referenced article you're pulling in stuff which I believe is written in a different language (probably C). If you use native python, I'm sure you would accept it would be much slower and take up much more memory. So we have to disagree here.

dmezzetti · on Nov 26, 2023

Where do you draw the line? Most of CPython is written in C including the arrays package (https://docs.python.org/3/library/array.html) mentioned in that article.

Yes, pure Python is slower and takes up more memory. But that doesn't mean it can't be productive and performant using these types of strategies to speed up where necessary.

_a_a_a_ · on Nov 26, 2023

With respect, I think you're clouding things by trying to defend what is really defensible. Okay then.

> Where do you draw the line?

Drawing the line at native python, not pulling in packages that are written in another language. Packages written in python only are acceptable in this argument.

> But that doesn't mean it can't be productive and performant using these types of strategies to speed up where necessary.

No one said it couldn't. What we're saying is that it pure python is 'slow' and you need to escape from pure python to get the speedups.

dmezzetti · on Nov 26, 2023

I agree that pure Python isn't as fast as other options. Just comes down to a productivity tradeoff for developers. And it doesn't have to be one or the other.

_a_a_a_ · on Nov 26, 2023

Agreed, then!

iopq · on Nov 26, 2023

So to make Python fast you just need to write a library in another language, brilliant

dmezzetti · on Nov 26, 2023

If you read the article referenced, I discussed a number of ways to write performant Python such as using this package (https://docs.python.org/3/library/array.html).

stavros · on Nov 26, 2023

> Python programmer for 15 years [...] [Python] has no typing

Ok, I have to call this statement out. Mypy was released 15 years ago, so Python has had optional static typing for as long as you've been programming in it, and you don't know about it?

I guess it's going to take another fifteen years for this 2008 trope to die.

aerhardt · on Nov 26, 2023

I'm primarily a Python programmer, I love mypy and the general typing experience in Python (I think it's better than TypeScript - fight me), but are you seriously comparing it to something - anything - with proper types like Rust?

kamov · on Nov 26, 2023

> I think it's better than TypeScript - fight me

I used Python type hints and MyPy since long before I used TypeScript, and I have to say that TypeScript's take on types is just plain better (that doesn't mean it's good though).

1. More TypeScript packages are properly typed thanks to DefinitelyTyped. Some Python packages such as Numpy could not be properly typed last I checked, I think it might change with 3.11 though. Packages such as OpenCV didn't have any types last I checked.

2. TypeScript's type system is more complete, with better support for generics, this might change with 3.11/3.12 though.

3. TypeScript has more powerful type system than most languages, as it is Turing-complete and similar in functionality to a purely functional language (this could also be a con)

stavros · on Nov 26, 2023

> but are you seriously comparing it to something - anything - with proper types like Rust?

Re-reading my comment, no, I did not. I said it has static typing.

Gracana · on Nov 26, 2023

> I have to call this statement out.

Why? That was just mean for no reason!

stavros · on Nov 26, 2023

Is that mean? Sorry, English is not my native language. I just meant that I have to express my doubt of the veracity of the statement.

Gracana · on Nov 26, 2023

Your language is fine (I’ve enjoyed your blog posts too, never gave it a thought that English wasn’t your first language), I just thought it was unnecessarily hurtful to say they must be a phony because they didn’t know something.

But, everybody else seems to agree so maybe I’ve been had.

stavros · on Nov 26, 2023

I didn't mean to say they are a phony, just that that statement is inaccurate/poorly thought out.

djbusby · on Nov 26, 2023

English is my native language. "I have to call out" is a perfectly fine (and polite) way to express doubt of veracity.

DSingularity · on Nov 26, 2023

Python does have typing. Although it doesn’t feel as “first class” like as rust or golang it gets the job done.

HumanOstrich · on Nov 26, 2023

Yea everyone should just rewrite EVERYTHING in Rust! /s

PossiblyKyle · on Nov 26, 2023

Fair point, then you could claim it's similar to this DB with its reliance on Faiss. Despite that, Chroma at this point is more feature rich. I was mostly referring to this https://thedataquarry.com/posts/vector-db-1/

You are not wrong about the performance from Rust, but LanceDB is inherently written with performance in mind. SIMD support for both x86 and ARM, and an underlying vector storage approach that's built for speed (Lance)

dmezzetti · on Nov 26, 2023

I've seen a number of projects come over the last couple years. I'm the author of txtai (https://github.com/neuml/txtai) which I started in 2020. How you approach performance is the key point.

You can write performant code in any language. For example, for standard keyword search, I wrote a component to make sparse/keyword search just as efficient as Apache Lucene in Python. https://neuml.hashnode.dev/building-an-efficient-sparse-keyw....

marcinzm · on Nov 26, 2023

>more feature rich

Not necessarily a good thing when the product is made by a VC backed startup that may die or pivot in six months leaving you the need to maintain it yourself.

freediver · on Nov 26, 2023

It is faster!

We needed a low latency, on premise solution that we can run on edge nodes with sane defaults that anyone in the team can whim in a sec. Also worth noting is that our use case is end to end retrieval of usually few hundred to few thousand chunks of text (for example in Kagi Assistant research mode) that need to be processed once at run time with minimal latency.

Result is this. We periodically benchmark the performance of different embeddings to ensure best defaults:

https://github.com/kagisearch/vectordb#embeddings-performanc...

hantusk · on Nov 26, 2023

I thought the API here was quite neat. It's fairly simple to implement a lancedb backend for it instead of sklearn/faiss/mrpt as the source code is really simple.

This repo is basically just a nice api and the needed chunking and batching logic. Using lancedb, you'd still have to write that, as exemplified here: https://github.com/prrao87/lancedb-study/blob/main/lancedb/i...

mark_l_watson · on Nov 26, 2023

Same for me. I started using Chroma (about) a year ago, I am used to it, and if I am using Python I look no further.

When I use Common Lisp or Racket I roll my own simple vector embeddings data store, but that is just me having fun.

PossiblyKyle · on Sept 18, 2023

While not identical in functionality, I highly recommend Keka: https://www.keka.io there's even an iOS app

PossiblyKyle · on Sept 18, 2023

Wipr is great, but they're a separate thing. Wipr does not remove the nagging cookie requests

keybits · on Sept 18, 2023

Wipr does remove cookie warnings. From the linked page:

> Wipr blocks all ads, trackers, cryptocurrency miners, EU cookie and GDPR notices...

I use Wipr and can confirm that I rarely see cookie warnings in Safari with it enabled.

I might be wrong, but I believe this is better than Hush as Wipr blocks cookies, while Hush accepts website defaults.

PossiblyKyle · on Sept 18, 2023

Interesting to see, as I still encounter these banners from time to time. I also have Wipr Extra enabled. Apologies for the mistake

PossiblyKyle · on July 9, 2023

SwiftData integration (and notably the Observable macro) seems like a huge step in the right direction for it. The problem we’ve found is that integrating SwiftUI in existing UIKit apps that also rely on stuff like RxSwift isn’t easy. So far it’s only good at brand new presentations for us. Another thing is that a lot of the great new features are locked behind iOS targets that are plainly too new to be realistic for products

newZWhoDis · on July 9, 2023

>The problem we’ve found is that integrating SwiftUI in existing UIKit apps that also rely on stuff like RxSwift isn’t easy

Your RxSwift codebase should have converted to Combine 3 years ago, and today you’d map the observable streams to SwiftUI with a simple @Published property wrapper (now called macro… ugh)

It’s really not that hard, and Combine can push your state to both new and legacy views with a few lines of hookup.

interactivecode · on July 9, 2023

Mixing and matching 2-3 UI / state frameworks will always be an hassle.

PossiblyKyle · on July 8, 2023

It is solvable by making the hard decision to move to Python 4 with no backward compatibility. The two core issues imo in Python are the GIL and the environment hell and both simply can’t be solved while still keeping the 3 moniker. We’re in a field of constant workarounds and duct tape because we try pleasing too much

bbkane · on July 8, 2023

Python tried that (version 2 to 3) and both the community and dev team were traumatized by the effects enough they've publicly said it'll never happen again.

Some things really are too big to change.

imtringued · on July 8, 2023

That means they didn't learn from it at all. The problem with Python 2 to python 3 is that it lost backwards compatibility because of very silly reasons like turning the print statement into a function. The vast majority of the problems could have been avoided by not making pointless changes with dubious benefits.

gcbirzan · on July 8, 2023

I seriously doubt anyone had problems fixing print as a statement. 2to3 fixed it...

I'll admit that, yes, changing string to bytes and unicode to string was a bit annoying, but the change itself wasn't fundamentally 'of dubious benefit', it did have benefits, and related to this, the only major issue was that you couldn't, for a long time, have code that worked in both where it came to literals. The biggest problem here was the implicit conversion from 2, that I agree needed to go.

Most of the other things can be trivially fixed automatically, or at least detected automatically, but without type hinting, it wasn't really easy to fix the automatic conversion.

There were other changes that were a bit tricky, but the majority of issues stemmed from the str/bytes change.

ilyt · on July 8, 2023

slap "use v4" at start of the file to use new semantics and use it automatically for .py4

Then transpile Py3 code into Py4.

BiteCode_dev · on July 8, 2023

Transpilation won't work with semantics.

And it won't change anything about c extensions.

brazzy · on July 8, 2023

Not going to happen for another 10 years at least. Not after how long and painful the move to Python 3 was.

rolisz · on July 8, 2023

Moving from 2 to 3 was a long and difficult migration, so 3 to 4 will be similarly difficult

dev_tty01 · on July 8, 2023

Seems a bit conclusory. Doesn't this imply the community is incapable of learning from experience?

fmajid · on July 8, 2023

I’ve been using Python since 1994 and my 2-to-3 migration plan was Go…

jug6ernaut · on July 8, 2023

The account of time and money ppl put into building and maintaining systems built with scripting languages has never made sense to me.

PossiblyKyle · on July 5, 2023

Admittedly I don’t have experience contributing to FOSS projects (yet), but like many companies we use the feature branch workflow, and have minimal conflicts; both on tasks and in the code itself. The reason is (and I’m stating the somewhat obvious) that we have engineering managers whose primary job is coordinating the tasks between teams and within each team. This is done externally via Jira, and to me it reflects on a weak spot of GitHub if it wants to be “the place to manage a project” - it’s good for managing a code base but lacks the tools to manage it as a project/product.

madeofpalk · on July 5, 2023

GitHub has been working on this recently and made pretty significant improvements with the new Org Projects and Tasklists in issues.

Project management for my day job happens entirely in GitHub and while it’s not perfect, it’s significantly better than it was even 12 months ago.

PossiblyKyle · on July 4, 2023

It is, and it’s even more popular than reddit in my local communities. The problem is the lack of anonymity

erremerre · on July 4, 2023

Do you mind to prove some example of communities popular? Sorry but I haven't had facebook since 2016 and I literally have no idea what is going on with it.

PossiblyKyle · on July 4, 2023

I don't live in the US. We have groups for uni, the tech sector (in general, kind of like /r/programming in a way), ML&DL, dating, sales (alongside Facebook Marketplace). We also have groups for towns and neighborhoods as the prime way for the locals to communicate. It is de-facto reddit here

PossiblyKyle · on June 23, 2023

You can essentially already do that. I live in a place basking with sun most of the year, and have 22k of panels with a 15kw inverter (max allowed before being considered commercial). My production is a wide hill with a considerable plateau, thanks to the amount of panels (as an example, produced 140kwh yesterday). It is clearly not economically viable for most regions, but for us it luckily is

kurthr · on June 23, 2023

Even more important, if you're on batteries as opposed to grid-tied, your battery costs can go down significantly. Since the worst days of the year tend to be winter and overcast, if you can install enough capacity to still generate a significant fraction of your needs on those days (eg at 10% of summer peak) the size of the required batteries can be much smaller.

It doesn't work so well in the north east, but if weather is only bad for a few days, you might need much less backup or generator power. Batteries and generator power tends to be much less than adding excess solar.

tbihl · on June 24, 2023

I imagine you sell a lot of electricity back. I can't imagine using anywhere near that much power.

PossiblyKyle · on June 24, 2023

Definitely, haven’t paid an electric bill since the installation, even during the winter

PossiblyKyle · on May 10, 2023

Glad to see technology is finally here for it

PossiblyKyle · on March 20, 2023

The most surprising part is ML being an elective course in 2016. ML is mandatory in my university, but DL is elective