More

kardos · 2025-07-30T00:04:39 1753833879

Sounds like a great use for a local LLM to strip ads from the output of the ad-infested LLM.

kardos · 2025-06-23T20:26:05 1750710365

Well, heat capacity and thermal conductivity are not the same thing

kardos · 2025-06-04T20:12:55 1749067975

because it's a much better experience than copy-pasting into a webapp

kardos · 2025-06-04T20:09:02 1749067742

I'd like to try it, but enterprise only?

kardos · 2025-04-14T00:15:54 1744589754

Surely there must be a way to do the joins in software, without doing it by hand, eg a SQL-like library? Pandas or equivalent?

Rohansi · 2025-04-14T06:16:00 1744611360

Of course - but that is the best case scenario. You will need to support other kinds of queries as well, including writes, which is where it gets even more complicated. The guarantees provided by your RDBMS go away when you shard your database like this. Transactions are local to each database so writes to multiple cannot be a single transaction anymore.

kardos · 2025-04-08T16:58:45 1744131525

Indeed. KYC has a purpose though -- prevention of fraud, money laundering, etc. Getting rid of KYC without a similarly-effective solution for those things seems unlikely. Ideas?

irusensei · 2025-04-09T13:23:27 1744205007

That’s not really true. Most financial crimes are big operations facilitated by banks. Criminals love KYC because that’s a chance to make their operations seem legit.

amrocha · 2025-04-09T01:09:11 1744160951

Here’s an idea, get rid of cryptocurrencies and the need for KYC basically vanishes.

sneak · 2025-04-09T05:25:14 1744176314

If you get rid of cocaine, the need for rehab centers also vanishes.

There is no way to “get rid of cryptocurrencies” at this point save for shutting off the internet. It is not within the power of the state to prohibit, any more than prostitution or cocaine.

amrocha · 2025-04-09T09:17:27 1744190247

Sure, there would be a black market for it, but that black market would be a lot smaller than the open market we have right now.

There’s plenty of legal ways of exchanging cryptocurrencies for real currencies, shutting those down would be a good start.

kardos · 2025-03-17T18:05:55 1742234755

Things like this juice the anti-landlord sentiment: https://www.propublica.org/article/justice-department-sues-l...

kardos · 2025-02-03T15:53:10 1738597990

The Gell-Mann amnesia effect applies to LLMs as well!

https://en.m.wikipedia.org/wiki/Gell-Mann_amnesia_effect

kardos · 2025-01-23T22:40:27 1737672027

> Having a whole other device constantly running just to wake up my main device feels like a waste.

Indeed, my first thought is the best place to run this is on an OpenWRT router. Perhaps as a package, or a builtin feature?

kardos · 2024-12-29T17:22:06 1735492926

So why are they 3.27x slower to insert? Are they 3.27x longer in string form?

sgarland · 2024-12-29T19:17:44 1735499864

It's likely a function of the fact that `gen_random_uuid()` is implemented in C [0], and is essentially just reading from `/dev/urandom`, then modifying the variant and version bits. Whereas, assuming they're using something like what was described here [1], that's a lot of function calls within Postgres, which slows it down.

As an example, this small function that makes UUIDv4:

    postgres=# CREATE OR REPLACE FUNCTION custom_uuid_v4() RETURNS uuid AS $$
        SELECT encode(set_byte(set_byte(gen_random_bytes(16), 6, (get_byte(gen_random_bytes(1), 0) & 15) | 64), 8, (get_byte(gen_random_bytes(1), 0) & 63) | 128), 'hex')::uuid;
    $$ LANGUAGE sql;

Took 14.5 seconds to create / insert 1,000,000 rows into a temp table, compared to 7.1 seconds for `gen_random_uuid()`.

[0]: https://doxygen.postgresql.org/uuid_8c.html#a6296fbc32909d10...

[1]: https://blog.daveallie.com/ulid-primary-keys/

atombender · 2024-12-29T22:07:26 1735510046

I don't think that's right. They show in the section titled "Generating" that the performance of calling the ULID function from SQL is only very slightly slower. It's the INSERT that performs worse.

Generally, inserting sorted values (like sequential integers or in this case, ULIDs) into a B-tree index is much faster than inserting random values. This is because inserted values go into the same, highly packed B-tree nodes, whereas random inserts will need to create a lot of scattered B-tree nodes, resulting in more pages written. Random values are generally faster to query, but slower to insert.

In this case I think the insert speed differences may come down to the sizes of the keys. Postgres's native UUID type is 128 bits, or 16 bytes, whereas the ULID is stored as the "text" type, encoded as base32, resulting in a string that is 26 bytes, plus a 32-bit string length header, so 240 bits in total, or 1.87x longer. In the benchmark, the ULID insert is about 3x that of the UUID. So the overhead may be not just the extra space but the overhead of string comparisons compared to just comparing 128-bit ints.

Edit: The article doesn't actually say which ULID implementation they use. The one implemented in PL/PGSQL mentioned in one of the article's links [1] is very slow. The other [2] is quite fast, but doesn't use base32. However, this [3] native C extension is fast, about 15% faster than the UUID function on my machine.

On my machine, using pg-ulid, inserting 1M rows was on average 1.2x faster for UUID than ULID (mean: 963ms vs 1131ms). This is probably all I/O, and reflects the fact that the ULIDs are longer. Raw output here: https://gist.github.com/atombender/7adccb17a95056313d0e8ff56....

Edit 2: They don't have an index on the column in the article, so my comment about B-tree performance doesn't apply here.

[1] https://blog.lawrencejones.dev/ulid

[2] https://blog.daveallie.com/ulid-primary-keys/

[3] https://github.com/andrielfn/pg-ulid

sgarland · 2024-12-30T00:34:29 1735518869

I assumed that they were storing the ULIDs as binary, in the UUID column type, as link 2 in your reply. If stored as TEXT, then yes, that absolutely would make a difference.

It’s also worth noting that unlike MySQL / SQL Server, Postgres does not store tuples clustered around the PK. Indices are of course still in a B+tree.

atombender · 2024-12-30T00:40:37 1735519237

They show that they're storing the ULIDs as text. Quoting from the article:

   CREATE TABLE ulid_test(id TEXT);

I suspect their poor results come from their choice of ULID implementation. The native C implementation I tried out is faster than the Postgres UUID type when testing computation only.

I noticed a bug in their test: They call generate_ulid() with now(). But now() is an alias for transaction_timestamp(), which is computed once at the start of the transaction, so all the timestamps will be the same. They should be using clock_timestamp().

sgarland · 2024-12-30T02:44:32 1735526672

Good catch to both.