Hacker Newsnew | past | comments | ask | show | jobs | submit | zawerf's commentslogin

People who get this should donate their excess bandwidth to a "worthy" cause.

For example e-hentai.org serve their images from a p2p system called hentai@home and their total network is only using ~4Gbit/sec:

https://e-hentai.org/hentaiathome.php

(or https://imgur.com/a/1H04buw if you don't want to login)


CAPTCHA isn't just a matter of protecting your site. One of the most evil attacks nowadays is "Distributed Spam Distraction", where you spam your victim with thousands of emails per second so an important email (e.g., fraudulent purchases) gets lost in the noise.

How do you do this in a world with decent spam filters? By using the victim's email to sign up for real services so they get hit with a welcome email. Because these are real services, spam filter won't catch it. This can only be done with services that have sign up forms that are easily automated.

The most evil thing here is your email is crippled even after the attack is over because these real companies will keep sending you newsletter and it's impossible to unsubscribe to them all.


You've just reminded me I really need to use unique email addresses for each service.


If you use gmail, you can add a + followed by anything and it goes to the same mailbox.

For example, if signing up to drop, I might use myemail+drop@gmail.com

Makes it very easy to see which services are selling the address you provide to advertisers


Yeah I know, and you're unfortunately correct that I use Gmail, but it's something I'm planning to change soon.

Also, if someone was targeting you with spam that won't help. They'll just remove the "+..." and you're back to the same problem.


Express Helmet also turns it off by default and included a rationale for why (tldr: privacy leaks when there are random external links posted on a page): https://helmetjs.github.io/docs/dns-prefetch-control/


There was a really interesting post on this topic recently where you have a circular array of circular arrays of size sqrt(n). [1]

The result is you can do O(1) access, O(sqrt(N)) insert and delete at arbitrary indices, and O(1) insert at head and tail.

In terms of big O this is strictly better than:

- arrays: O(1) access, O(N) insert/delete in middle, O(1) insert/delete at tail.

- circular arrays: O(1) access, O(N) insert/delete in middle, O(1) insert/delete at head and tail.

- fixed page size chunked circular arrays such as the c++ implementation of std:deque which is still O(N) for insert and delete. [2]

[1] https://news.ycombinator.com/item?id=20872696

[2] https://stackoverflow.com/questions/6292332/what-really-is-a...


Assuming you ignore the constant. And if you're willing to ignore the constant, you can get O(c) access, O(N^(1/c)) insert and delete at arbitrary indices, and O(c) insert at head and tail, for any constant c. The trick is to make the array into a B-tree of width N^(1/c) and depth c. Note that the insert/delete time is amortized: the worst-case time is O(cN^(1/c)).


Yea someone else also mentioned that generalization (called a tiered vector) in that thread: https://news.ycombinator.com/item?id=20873110


It's actually a simplified 2-layer skip list implementation.


I think people are supposed to report their salary with the stock price at grant, not after appreciation (at least on levels.fyi). But yea it's a lot more because google stock is half their comp and it keeps doubling every 3-4 years.


I’m from Levels.fyi. We actually request users to report compensation with appreciation. This is because the appreciation can actually be used to negotiate compensation elsewhere and provides a more accurate picture.


It's no secret that content moderators moderate content. Most of the stuff on HN is shaped by a few people who gets to decide what's interesting enough to get a boost or not.

It definitely feels like gaslighting when you notice it happening. For example a few times I know I made a comment on an old article the day before but it didn't get traction. But then it would be on the frontpage again the next day with all the timestamps manipulated to seem fresher, including on my own comments! I know I was sleeping at that time so then I start questioning my sanity and whether I was sleepwalking or not!


That last bit sounds like you ran into the second-chance pool (described at https://news.ycombinator.com/item?id=11662380), which involves modifying timestamps when re-upping. See https://news.ycombinator.com/item?id=19774614 and https://news.ycombinator.com/item?id=16117291.

Most of HN isn't shaped by moderators boosting stories. But some of it is. The intention is just to make the site more interesting to the community. We don't always get that right, because sometimes the stories we think people will like just get flagged, and in that case we usually accede to the flaggers and unboost the thing. But most of the time it seems to work out.


I am always irrationally(?) scared of using these sanitizers despite their successful history. As soon as new html/js/css syntax/features are introduced, won't your security model need to be reevaluated? Which seems like a lost cause at the rate new capabilities are introduced to the web. E.g., when CSS Shaders lands, you might be able to execute arbitrary gpu code with just css (hypothetically speaking, I don't actually know how it will work. I am sure it'll be sandboxed pretty well. But the problem remains that there are too many new possibilities to keep up with!).


DOMPurify (as a client-side sanitizer) uses a whitelist. There's also CSP for defense-in-depth.

I would be more concerned of using server-side sanitizers due to the impedance mismatch between client/server HTML parsing algorithms.


Security models are constantly being re-evaluated as new threats and attack vectors emerge.

What you said can be generically applied to every security control and which is why security is hard.


Isn't that like saying there's no point in using an anti virus as viruses are always evolving?

You're still catching entire classes of existing issues..


> Isn't that like saying there's no point in using an anti virus as viruses are always evolving?

You're very close to understanding something.

(Though in defense of DOM purifiers they can use a whitelist)


You mean, you are catching exploits for vulnerabilities that don't exist anymore, and you pay for that with a gigantic attack surface that can be used to compromise you? Yeah, that sounds about right.


Bad example. Anti virus software is a scam. Just adds another attack vector when the anti virus software has a bug in their file parsing & makes it that you can be impacted by just downloading a malicious file

Windows Defender is sufficient & bundled with Windows


I mean I never said anything about buying one you just assumed that. I also just use windows defender of which part of that is an anti virus..


Make it a whitelist. :)


It wouldn't help if new features extend the capabilities of existing stuff (which is done all the time). For example the CSS Shader example from before adds new syntax to the existing 'filter' css style, which you might've already whitelisted because it is safe today.


I guess a nested, parameter-granularity whitelist would work in that case :)


You can do that with DOMPurify using hooks.


I am trying this out and I am still on the edge of whether I like it or not.

Create a table with a json column:

  CREATE TABLE Doc (
    id UUID PRIMARY KEY,
    val JSONB NOT NULL
  );
Then later it turns out all documents have user_ids so you add a check constraint and an index:

  ALTER TABLE Doc ADD CONSTRAINT check_doc_val CHECK (
    jsonb_typeof(val)='object' AND
    val ? 'user_id' AND
    jsonb_typeof(val->'user_id')='string'
  );
  CREATE INDEX doc_user_id ON Doc ((val->>'user_id'));
I think the postgres syntax for this is pretty ugly. And if you also want foreign key constraints you still have to move that part of the json out as a real column (or duplicate it as a column on Doc). I am not sure it's even worth it to have postgres check these constraints (vs just checking them in code).

I am also a little worried about performance (maybe prematurely). If that document is large, you will be rewriting the entire json blob each time you modify anything in it. A properly normalized schema can get away with a lot less rewriting?


At this point, I would just bite the bullet, break out a new user_id column and run a migration to populate the column.


Just a note about using uuid as a primary key. Typically you will use a b-tree index, which likes to keep things sorted. So something like a serial number works best, because it is already sorted and will be appended at the end. Otherwise inserting a new column will cause traversal the b-tree all over the place which will hurt performance it you do a lot of inserts.

If you really want to use uuid and care about performance you might prefix it with something that's increasing like a date, or perhaps (did not try it) use hash index (need to be PG 10+).


(We're getting way off topic) but I think the problem with auto increment is that it can't be sharded easily since multiple shards can increment to the same value. If you then try to go back to random ids you're now stuck with 8 bytes which will conflict once every billion items or so. I guess it's pretty extreme premature optimization but I think UUID is nicer for future-proofing at the cost of some performance. (I would love to see benchmarks to know exactly how much performance I am giving up though)

By the way uuidv1 is already prefixed by a timestamp! But unfortunately it doesn't use a sortable version of the time so it doesn't work for clustering the ids into the same page. I think it was really designed for distributed systems where you would want evenly distributed ids anyway.


> I would love to see benchmarks to know exactly how much performance I am giving up though

https://www.youtube.com/watch?v=xrMbzHdPLKM

It ends up being a pitch for Aurora at the end (as with any presentation from AWS folks), but it has tons of useful information for standard Postgres.


In MySQL/MariaDB/Percona InnoDB Galera every writeable replica has an auto increment offset.


Same in postgres[1], and I'm willing to guess every relational database has a way to do it.

[1] https://www.postgresql.org/docs/current/sql-createsequence.h...


Technical note: big oh isn't a useful measure here. Most databases use b-trees (yes, even mongo) so lookups are at best O(log(n)). That goes for you and the people replying too.

The constant factors are way more important here. It's a 1000x factor difference depending on how durable you need your data to be (whether you need to write to disk or a quorum of network nodes in multiple regions). That is basically the only thing that mattered in the recent mongo vs postgres benchmarks.


Here's the guy he said he got inspiration from: https://jakealbaugh.com/

I think if liked those you would also like:

http://acko.net/

http://worrydream.com


The inspiration in particular is this demo, if you want to see how it's all working. - https://codepen.io/jakealbaugh/pen/PwLXXP


yeah, acko.net is awesome. I remember seeing it the first time a few years ago and being blown away by the intro.


yeah, 2013: http://acko.net/blog/zero-to-sixty-in-one-second/

just absolutely blew my mind as well

this talk by the same author is also great: https://www.youtube.com/watch?v=Zkx1aKv2z8o


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: