Hey, the writes were batched[1]. What is expensive here is the serialization. [1...

electroly · on Feb 7, 2024

This is the smoking gun, unfortunately; I was right. This code is not correct. You're doing a single insert per transaction (which is the same thing you'd get if you hadn't used an explicit transaction at all). What you want is a single transaction with ALL of the inserts in it. Do it again with a single transaction and you'll find it massively improves the performance. This is a common pitfall for beginners.

Edit: You've changed the link after I wrote this reply. Did you fix the code? Is the blog post using the original code or this updated code?

slingnow · on Feb 7, 2024

You're asking too many questions. Didn't you read the title? SQLite isn't enough! /s

cooljacob204 · on Feb 7, 2024

Just because you could increase the performance by putting everything into a transaction doesn't make it right to benchmark with.

Most people won't be inserting large amounts in a single transaction. Ex webserver with many concurrent clients inserting.

vishnumohandas · on Feb 7, 2024

My bad, I linked to the wrong portion of the code. Here's the updated link[1].

We are executing the entire write in a single batch query.

From what we observed, and have documented, it is the serialization that is causing the bottleneck, not the write itself.

Also, more than writes, it is the latency associated with deserializing the read data into models that pushed us to look at options outside SQLite.

[1]: https://github.com/ente-io/edge-db-benchmarks/blob/43273607d...

electroly · on Feb 7, 2024

Why are you comparing DB engines if the bottleneck is the serialization? You're the one doing the serialization! You could store the embedding directly into SQLite columns if you wanted.

vishnumohandas · on Feb 7, 2024

I agree that deserialization is not the responsibility of the database.

But as an end consumer, the overall throughput while using SQLite as the DB engine was not sufficient to serve our use case of reading 100k embeddings.

hinkley · on Feb 7, 2024

What people are saying is your premise is wrong, and you’re getting defensive.

Either of those would not be an interesting problem except that your title is calling out a particular and beloved database as being the culprit. Which feels a lot like defamation.

Even if deserialization is the bottleneck, bottlenecks are relative and don’t have to be singular. You can have multiple bottlenecks and being blind to one because of PEBCAK can still skew results and thus conclusions.

The people you’re responding to want you to fix your code and run it again. I think that’s reasonable. And maybe deserves a follow up post or a rework of the existing one.

vishnumohandas · on Feb 7, 2024

Sorry if I sounded defensive!

I have nothing against SQLite, but I unfortunately haven't yet understood how to write and read Lists more efficiently. Once I do, I'll definitely post a follow up :)

vishnumohandas · on Feb 11, 2024

Update: https://ente.io/blog/tech/sqlite-objectbox-isar#update

TL;DR: Removing Protobuf and directly serializing the list to a blob helped!

lqet · on Feb 7, 2024

If the serialization is really the bottleneck, as you say, then you aren't serializing the data in the Isar evaluation. Which makes the comparison unfair.

Your comment reads a bit like saying you don't like a Toyota car for transporting apples, because it takes too long to bake a pie out of them first.

BTW, if Isar's data format really is "very close to in-memory representation", then why aren't you simply packing the 512 floats into an array (which I guess is how this data is represented anyhow), and use this continuous block of memory directly as the serialized value?

vishnumohandas · on Feb 7, 2024

Hey, since SQLite does not yet support lists, if there's a better way to serialize so that reads can be sped up, do let me know, it would help!

lovasoa · on Feb 7, 2024

SQLite supports raw binary BLOBs, and I would store embeddings as a simple 512*4 = 2048 bytes blob representing just the float32 embeddings. Then you can access the data from the buffer directly in dart: https://api.dart.dev/stable/2.8.1/dart-typed_data/ByteBuffer...

vishnumohandas · on Feb 7, 2024

This[1] has improved the situation considerably!

```

[log] SQLite: 100000 embeddings inserted in 2635 ms

[log] SQLite: 100000 embeddings retrieved in 561 ms

```

Isar is still ~2x faster, but there's room for optimization.

Thanks a bunch for sharing this, I will update the post.

[1]: https://github.com/ente-io/edge-db-benchmarks/commit/51ec496...

lovasoa · on Feb 7, 2024

Looking at the rest of your code, it looks like you are doing a lot of unnecessary processing on these embeddings.

They even seem to be doubly json encoded at some point:

https://github.com/ente-io/clip-ggml/blob/main/lib/clip_ggml...

In my opinion, they should probably just be a memory buffer representing the raw floats all the way down: from the output of the model to the database. They should never be encoded, neither in json, nor as a dart List<double>.

vishnumohandas · on Feb 7, 2024

Type-conversions with FFI turned out to be non-trivial, so `{"embedding":[...]}` was a way out at the cost of a small performance hit (when compared to the time spent on inference).

We'll take another look.

lovasoa · on Feb 7, 2024

The result of clip.[...].call(...) is already a buffer, isn't it? You just have not to touch it at all.

And on the cpp side, remove the json encoding and just return a raw buffer.

lqet · on Feb 7, 2024

Wow, kudos for actually implementing the proposed fix and documenting it here so quickly!

CarefreeCrayon · on Feb 7, 2024

I found this interesting and have been profiling to understand why the increase in performance was so significant. It looks to me that the main culprit was Protobufs + Garbage collection. Serializing to protos performs a lot of allocations. Using just the Float32View skips all that.

``` I/scudo ( 641): Stats: SizeClassAllocator64: 572M mapped (0M rss) in 11986660 allocations; remains 257629 I/scudo ( 641): 00 ( 64): mapped: 1024K popped: 506106 pushed: 491660 inuse: 14446 total: 15044 rss: 0K releases: 0 last released: 0K region: 0x7ceae87000 (0x7ceae86000) I/scudo ( 641): 01 ( 32): mapped: 1024K popped: 92137 pushed: 73047 inuse: 19090 total: 26708 rss: 0K releases: 0 last released: 0K region: 0x7cfae8c000 (0x7cfae86000) ```

I think this is because during the proto encoding/decoding stage the protobuf lib ended up creating a bunch of objects to support the process

``` "Class","Library","Total Instances","Total Size","Total Dart Heap Size","Total External Size","New Space Instances","New Space Size","New Space Dart Heap Size","New Space External Size","Old Space Instances","Old Space Size","Old Space Dart Heap Size","Old Space External Size" _FieldSet,package:protobuf/protobuf.dart,100010,4800480,4800480,0,0,0,0,0,100010,4800480,4800480,0 PbList,package:protobuf/protobuf.dart,108536,3473152,3473152,0,0,0,0,0,108536,3473152,3473152,0 Embedding,package:edge_db_benchmarks/models/embedding.dart,108535,3473120,3473120,0,0,0,0,0,108535,3473120,3473120,0 EmbeddingProto,package:edge_db_benchmarks/models/embedding.pb.dart,100010,1600160,1600160,0,0,0,0,0,100010,1600160,1600160,0 ```

What's missing here is that these have to be copied over to the database isolate as well.

lovasoa · on Feb 7, 2024

Maybe that's an opportunity for a pull request to the dart protobuf library?

jjk7 · on Feb 7, 2024

Are batch operations automatically atomic in SQLite? SQLite makes no mention of batch operations.

Looking at the library, it seems it's just using prepared statements outside of a transaction...

https://github.com/powersync-ja/sqlite_async.dart/blob/f994e...

Seems the other commenter is correct that wrapping that line in a tx will fix it.

epcoa · on Feb 7, 2024

https://github.com/powersync-ja/sqlite_async.dart/blob/f994e...

I think you may be looking too deeply. It does look like that object will wrap in a tx.

jjk7 · on Feb 7, 2024

Ah, yeah I think you're right.

hinkley · on Feb 7, 2024

Within the DB world, “batch” usually means a sequence of queries in a single transaction. In software a batch is clumping tasks to run together with any sort of separation from other clumps. Logical, time…

If you use one definition while everyone else is using the other, there’s going to be a loud and confused argument.

Since we are talking about databases, we’d all better be using the former, but I suspect maybe the author is not.

TedDoesntTalk · on Feb 7, 2024

> deserializing the read data into models

What does this have to do with SQLite?

skowalak · on Feb 7, 2024

You have changed the code snippets in your link.

The original link highlighted this snippet:

  Future<void> insertEmbedding(EmbeddingProto embedding) async {
      final db = await _database;
      await db.writeTransaction((tx) async {
        await tx.execute('INSERT INTO $tableName ($columnEmbedding) values(?)',
            [embedding.writeToBuffer()]);
      });
  }

Now highlighted:

  Future<void> insertMultipleEmbeddings(List<EmbeddingProto> embeddings) async {
    final db = await _database;
    final inputs = embeddings.map((e) => [e.writeToBuffer()]).toList();
    await db.executeBatch(
        'INSERT INTO $tableName ($columnEmbedding) values(?)', inputs);
  }

Did you post the wrong link the first time and corrected your mistake? Or did you change the snippet because @electroly was right about the code being inefficient?

vishnumohandas · on Feb 7, 2024

The code was always calling the latter[1], I had pasted the incorrect link.

[1]: https://github.com/ente-io/edge-db-benchmarks/blob/43273607d...

skowalak · on Feb 7, 2024

Thank you for pointing that out! It does make more sense, too. :)