Hi HN. I'm the director of Free Law Project, the org that's behind this, RECAP, and CourtListener.com.
The idea of this project is to create topic-based bots that follow certain areas of the law so that folks can access the raw data underlying the news and read it for themselves.
We have a handful of "topic bots" over at https://bots.law, and we're working on Slack/MS Teams/Discord integration as well. We'll probably be launching a Crypto bot soon, if we can find a reasonable curator for that.
We'd love your thoughts and if you follow tech cases, hopefully you'll find our bots useful!
At CourtListener we send thousands of emails each day. We recently migrated to AWS SES and learned a ton of lessons beyond DMARC/SPF/etc.
Getting all of those parts right is important, but you also have to handle bounces, down recipient servers, full inboxes, changed email addresses, and a lot more.
It's been a ton of work getting this set up, so we thought we'd share our notes.
This uses AWS's Textract service, but if you're doing a LOT of extraction, that gets pretty expensive pretty quickly. We do thousands of pages daily on CourtListener.com and created an open source microservice for this purpose. It can take PDFs, DOCX, DOC, TXT, HTML, or a handful of other files and extract the text, doing OCR if necessary:
This bug means that when somebody uninstalls Signal (or an iPhone disables it for lack of use), people sending messages never learn that the message didn't go through.
As a result:
"I just missed out on placing an offer on a house because of this issue."
"Today I ended up uninstalling Signal after an experience where this problem caused me to almost lose two of my friends."
"I just had the experience of being very worried about a friend's well-being because he stopped responding to my messages."
And so forth. It makes Signal a very risky thing to use, not because the encryption is bad, but because the UX is. It seems easy to fix, but Signal never seems to care.
My organization built a similar tool that can find bad redactions caused when people just use a black rectangle on top of text in PDFs: https://free.law/projects/x-ray
Yes, it's been that way long before PDFs. Simply knowing the potential words, often names, that could appear in a document, gives those with the redacted documents a chance at determining what has been hidden based on size. This might be part of the reason why when declassifying documents, the redactions end up being more of a sentence than is needed. The extra buffer of hidden words gives some additional protection to what needs to be redacted.
This reminds me of one of my proudest moments in high school.
For a test in German class (my worst class), the teacher had just used tippex to remove some words and put them next to the text, and we had to fill them back in. I grabbed my ruler and measured all the sizes. There was 1 very long word, many medium sizes and a few smaller ones, but with this information and the context of the text for the first and last time I was able to get my first and last 10/10 in this class.
A malicious "redacting" algorithm submitted to the underhanded C contest used a similar idea, just on lower level.
PNG allows ASCII numbers, so flipping all digits to 0 creates a pixel which is graphically "masked" but leaks information about the original pixel: "000" means the value was larger than 99.
Nope. That's called rebroadcast. It's also used to try to "launder" photo manipulations, like compositing. I helped work on some algorithms which could pick up artifacts even after rebroadcast.
I would absolutely not trust pdf not to leak metadata. Although now you risk metadata leak from the printer or scanner, which may or may not affect your threat model.
When a coworker asked me for my recommended method of creating and publicly sharing redacted copies of documents which (in their unredacted forms) contained PII for children, I told them to do this, in no uncertain terms.
> Am I the only one who redacts info, prints it out, then scans it back in?
if you have the source document, redacting from the source (by actually removing and replacing with an appropriate placeholder, not obscuring, the content) and regenerate the static (e.g., PDF) version.
If you are working from print, I think scan and redact by digital replacement (not overlay or otherwise obscure) would be sufficient. Redact->print->scan probably helps somewhat (especially if the scan is low quality) if you are using a bad redaction method to start with, but why do that?
Not if there is a rasterization step in the process. That's essentially what printing and scanning achieves, rasterization, and we can do that without the printer and scanner.
Of course, the artifacts introduced by printing and scanning (especially with contrast turned way up) gives it an air of legitimacy, although these can also be simulated.
If you print to paper and scan you are mostly safe, but if you do a software print to a pdf document you might use a tool that saves the actual content as invisible text or the whole word document as an attachment to the pdf. I would print and scan physically if it was something important. Or just edit the word document to remove the stuff and then print and scan to avoid saving the edit history since I don't know if that will be saved somewhere.
Usually I'm in full control of the software myself so I just output X instead of the secret data.
On MacOS, preview makes a clear distinction between 'drawing on' and 'redacting' PDFs.
It is an important part of UX that shooting yourself in the foot should _not_ be the default.
Wanting to redact information is not a subset of PDF knowledge. Understanding how PDFs work is not a prerequisite of desire to redact information. Lots of people have only the most basic rudimentary understand of how PDFs work, how Adobe works, and the limits or capabilities.
A lot of people don't even know you can print to a file instead of paper. Not sure why you're surprised about that, after all the standard method for all formats is "save as" or "export" and it's reasonable to assume those two options include all possible ways to save a file. It's a UI quirk that goes against user expectations.
Recently discovered a manual forr some home appliance with a clear Word comment along with username, seems like slipped in when the manual was translated.
This is fine, but signal still doesn't tell you when the person you're sending to has uninstalled signal. Instead, your messages go into ether and you think the person is ignoring you. It blows my mind they haven't prioritized this. https://github.com/signalapp/Signal-Android/issues/11164
Applications can't determine when they're uninstalled. Or, not reliably anyway, and not while following platform guidelines. So the question becomes how to tell uninstalled vs left in a drawer, powered down, while on vacation.
They just have to tell you if a message isn't received after a day or two. This is already exposed via the check marks, so it's just something they have to amplify with a notification.
Or when you start writing a message to somebody, if they haven't read the last couple messages signal could make that obvious. Etc. Lots of easy fixes.
They can just say the message wasn't received. They don't have to say it was uninstalled. Just loudly tell me things aren't working like I expected. That's all this takes.
There are multiple anecdotes in this thread, on HN, that people missed that. All GP is asking for is better UX making it more obvious, because being able to check is something other than knowing to check and how to check.
I don't see why that matters? (Especially given that Signal has far fewer users and presumably higher attrition than those other platforms.) If things can be better, than it would be great if they were.
This is bad design. Why excuse bad design? When I send a text message and it doesn't arrive, my messaging app lets me know. With Signal, this is a step backward.
Yea, it seems like this is the most information they could give you without violating the addressee's privacy by revealing whether they have uninstalled the app. I suppose it could be worth it if, when the message remains undelivered for a while, Signal added an explicit note to that effect so the sender doesn't misunderstand.
Yes, exactly this. All that's needed is to tell senders when a message wasn't received after X hours.
You don't have to figure out if the user uninstalled. This also happens if they get a new phone and don't re-install on it, so relying on uninstalls wouldn't work anyway.
Uninstalling doesn't send a notification to signal.org, I've previously messaged a few people without getting a response, later realizing they never got it because they switched phones and stopped using Signal without pressing the "Delete Account" button in Signal settings. The workaround is to have the user install+register again, then press delete.
> Signal must be actively working on your phone to make changes to the account. Register to see these options for your number. Deletion requests are not accepted outside of the registered app because there is no way to accurately verify whether or not a number is truly associated with the requester.
Yes, I expected as much: most users who stop using Signal (because, say, their friends use something else) are more likely to either just stop using it or uninstall the app, without explicitly deleting the account.
Another pain point for me: when I send an SMS to someone, I expect to get replies on SMS not on Signal. Don't try to replace SMS. It's just really annoying to have half the conversation in the text messages app and the other half in Signal app.
> we understand some customers may not use their G Suite legacy free edition for business and may be interested in other options. If you have 10 or fewer users in your group and do not use your G Suite legacy free edition for business, please complete the form below by April 1, 2022 if you're interested in learning about different options for your account in the coming months.
The idea of this project is to create topic-based bots that follow certain areas of the law so that folks can access the raw data underlying the news and read it for themselves.
We have a handful of "topic bots" over at https://bots.law, and we're working on Slack/MS Teams/Discord integration as well. We'll probably be launching a Crypto bot soon, if we can find a reasonable curator for that.
We'd love your thoughts and if you follow tech cases, hopefully you'll find our bots useful!