CAPTCHA isn't just a matter of protecting your site. One of the most evil attacks nowadays is "Distributed Spam Distraction", where you spam your victim with thousands of emails per second so an important email (e.g., fraudulent purchases) gets lost in the noise.
How do you do this in a world with decent spam filters? By using the victim's email to sign up for real services so they get hit with a welcome email. Because these are real services, spam filter won't catch it. This can only be done with services that have sign up forms that are easily automated.
The most evil thing here is your email is crippled even after the attack is over because these real companies will keep sending you newsletter and it's impossible to unsubscribe to them all.
Assuming you ignore the constant. And if you're willing to ignore the constant, you can get O(c) access, O(N^(1/c)) insert and delete at arbitrary indices, and O(c) insert at head and tail, for any constant c. The trick is to make the array into a B-tree of width N^(1/c) and depth c. Note that the insert/delete time is amortized: the worst-case time is O(cN^(1/c)).
I think people are supposed to report their salary with the stock price at grant, not after appreciation (at least on levels.fyi). But yea it's a lot more because google stock is half their comp and it keeps doubling every 3-4 years.
I’m from Levels.fyi. We actually request users to report compensation with appreciation. This is because the appreciation can actually be used to negotiate compensation elsewhere and provides a more accurate picture.
It's no secret that content moderators moderate content. Most of the stuff on HN is shaped by a few people who gets to decide what's interesting enough to get a boost or not.
It definitely feels like gaslighting when you notice it happening. For example a few times I know I made a comment on an old article the day before but it didn't get traction. But then it would be on the frontpage again the next day with all the timestamps manipulated to seem fresher, including on my own comments! I know I was sleeping at that time so then I start questioning my sanity and whether I was sleepwalking or not!
Most of HN isn't shaped by moderators boosting stories. But some of it is. The intention is just to make the site more interesting to the community. We don't always get that right, because sometimes the stories we think people will like just get flagged, and in that case we usually accede to the flaggers and unboost the thing. But most of the time it seems to work out.
I am always irrationally(?) scared of using these sanitizers despite their successful history. As soon as new html/js/css syntax/features are introduced, won't your security model need to be reevaluated? Which seems like a lost cause at the rate new capabilities are introduced to the web. E.g., when CSS Shaders lands, you might be able to execute arbitrary gpu code with just css (hypothetically speaking, I don't actually know how it will work. I am sure it'll be sandboxed pretty well. But the problem remains that there are too many new possibilities to keep up with!).
You mean, you are catching exploits for vulnerabilities that don't exist anymore, and you pay for that with a gigantic attack surface that can be used to compromise you? Yeah, that sounds about right.
Bad example. Anti virus software is a scam. Just adds another attack vector when the anti virus software has a bug in their file parsing & makes it that you can be impacted by just downloading a malicious file
Windows Defender is sufficient & bundled with Windows
It wouldn't help if new features extend the capabilities of existing stuff (which is done all the time). For example the CSS Shader example from before adds new syntax to the existing 'filter' css style, which you might've already whitelisted because it is safe today.
I am trying this out and I am still on the edge of whether I like it or not.
Create a table with a json column:
CREATE TABLE Doc (
id UUID PRIMARY KEY,
val JSONB NOT NULL
);
Then later it turns out all documents have user_ids so you add a check constraint and an index:
ALTER TABLE Doc ADD CONSTRAINT check_doc_val CHECK (
jsonb_typeof(val)='object' AND
val ? 'user_id' AND
jsonb_typeof(val->'user_id')='string'
);
CREATE INDEX doc_user_id ON Doc ((val->>'user_id'));
I think the postgres syntax for this is pretty ugly. And if you also want foreign key constraints you still have to move that part of the json out as a real column (or duplicate it as a column on Doc). I am not sure it's even worth it to have postgres check these constraints (vs just checking them in code).
I am also a little worried about performance (maybe prematurely). If that document is large, you will be rewriting the entire json blob each time you modify anything in it. A properly normalized schema can get away with a lot less rewriting?
Just a note about using uuid as a primary key. Typically you will use a b-tree index, which likes to keep things sorted. So something like a serial number works best, because it is already sorted and will be appended at the end. Otherwise inserting a new column will cause traversal the b-tree all over the place which will hurt performance it you do a lot of inserts.
If you really want to use uuid and care about performance you might prefix it with something that's increasing like a date, or perhaps (did not try it) use hash index (need to be PG 10+).
(We're getting way off topic) but I think the problem with auto increment is that it can't be sharded easily since multiple shards can increment to the same value. If you then try to go back to random ids you're now stuck with 8 bytes which will conflict once every billion items or so. I guess it's pretty extreme premature optimization but I think UUID is nicer for future-proofing at the cost of some performance. (I would love to see benchmarks to know exactly how much performance I am giving up though)
By the way uuidv1 is already prefixed by a timestamp! But unfortunately it doesn't use a sortable version of the time so it doesn't work for clustering the ids into the same page. I think it was really designed for distributed systems where you would want evenly distributed ids anyway.
Technical note: big oh isn't a useful measure here. Most databases use b-trees (yes, even mongo) so lookups are at best O(log(n)). That goes for you and the people replying too.
The constant factors are way more important here. It's a 1000x factor difference depending on how durable you need your data to be (whether you need to write to disk or a quorum of network nodes in multiple regions). That is basically the only thing that mattered in the recent mongo vs postgres benchmarks.
For example e-hentai.org serve their images from a p2p system called hentai@home and their total network is only using ~4Gbit/sec:
https://e-hentai.org/hentaiathome.php
(or https://imgur.com/a/1H04buw if you don't want to login)