More

A1kmm · 2025-02-02T11:32:41 1738495961

Unfortunately, the standard TLS protocol does not provide a non-repudiation mechanism.

It works by using public key cryptography and key agreement to get both parties to agree on a symmetric key, and then uses the symmetric key to encrypt the actual session data.

Any party who knows the symmetric key can forge arbitrary data, and so a transcript of a TLS session, coupled with the symmetric key, is not proof of provenance.

There are interactive protocols that use multi-party computation (see for example https://tlsnotary.org/) where there are two parties on the client side, plus an unmodified server. tlsnotary only works for TLS1.2. One party controls and can see the content, but neither party has direct access to the symmetric key. At the end, the second party can, by virtue of interactively being part of the protocol, provably know a hash of the transaction. If the second party is a trusted third party, they could sign a certificate.

However, there is not a non-interactive version of the same protocol - you either need to have been in the loop when the data was archived, or trust someone who was.

The trusted third party can be a program running in a trusted execution environment (but note pretty much all current TEEs have known fault injection flaws), or in a cloud provider that offers vTPM attestation and a certificate for the state (e.g. Google signs a certificate saying an endorsement key is authentically from Google, and the vTPM signs a certificate saying a particular key is restricted to the vTPM and only available when the compute instance is running particular known binary code, and that key is used to sign a certificate attesting to a TLS transcript).

I'm working on a simpler solution that doesn't use multiparty computation, and provides cloud attestation - https://lemmy.amxl.com/c/project_uniquonym https://github.com/uniquonym/tls-attestproxy - but it's not usable yet.

Another solution is if the server will cooperate with a TLS extension. TLS-N (https://eprint.iacr.org/2017/578.pdf) provides a solution for this. That provides a trivial solution for provenance.

Intralexical · 2025-02-03T01:44:20 1738547060

As important as cryptography is, I also wonder how much of it is trying to find technical solutions for social problems.

People are still going to be suspicious of each other, and service providers are still going to leak their private keys, and whatnot.

A1kmm · 2025-01-19T03:02:51 1737255771

If you use nginx to front it, consider something like this in the `http` block of your config:

    map $http_user_agent $bottype {
        default          "";
        "~.*Amazonbot.*" "amazon";
        "~.*ImagesiftBot.*" "imagesift";
        "~.*Googlebot.*" "google";
        "~.*ClaudeBot.*" "claude";
        "~.*gptbot.*" "gpt";
        "~.*semrush.*" "semrush";
        "~.*mj12.*" "mj12";
        "~.*Bytespider.*" "bytedance";
        "~.*facebook.*" "facebook";
    }
    limit_req_zone $bottype zone=bots:10m rate=6r/m;
    limit_req zone=bots burst=10 nodelay;
    limit_req_status 429;

You can still have other limits by IPs. 429s tends to slow the scrapers, and it means you are spending a lot less on bandwidth and compute when they get too aggressive. Monitor and adjust the regex list over time as needed.

Note that if SEO is a goal, this does make you vulnerable to blackhat SEO by someone faking a UA of a search engine you care about and eating their 6 req/minute quota with fake bots. You could treat Google differently.

This approach won't solve for the case where the UA is dishonest and pretends to be a browser - that's an especially hard problem if they have a large pool of residential IPs and emulate / are headless browsers, but that's a whole different problem that needs different solutions.

lobsterthief · 2025-01-19T03:12:34 1737256354

For Google, just read their publicly-published list of crawler IPs. They’re broken down into 3 JSON files by category. One set of IPs is for GoogleBot (the web crawler), one is for special requests like things from Google Search Console, and one is special crawlers related to things like Google Ads.

You can ingest this IP list periodically and set rules based on those IPs instead. Makes you not prone to the blackhat SEO tactic you mentioned. In fact, you could completely block GoogleBot UA strings that don’t match the IPs, without harming SEO, since those UA strings are being spoofed ;)

A1kmm · 2025-01-13T07:08:04 1736752084

I don't think analogies need to be historically accurate to be useful - and often there is something to the modern usage of them even if the original basis is a misrepresentation. Analogies and parables serve as a shortcut to aid understanding in communication, and that is valuable even if they are entirely apocryphal.

Similar examples: * The tragedy of the commons - this is an extremely useful parable to invoke a game-theoretic / social behaviour concept. However, it is apparently historically inaccurate - at small scales, societies were able to protect and look after common lands, because of other societal factors such as reputation; the actual loss of the commons came not from overuse by the unlanded but from people who already had their own land taking more and fencing it off. Nevertheless, in larger scale societies, the principle of over-exploitation of shared resources in the absence of a mechanism to prevent that is a real and valid concern, so the analogy has a lot of value. * The use of probably apocryphal fables like 'The boy who cried wolf' have immediate meaning when it comes to concepts like alarm fatigue. * Many religious analogies have persisted in societies that don't hold the original religious belief. 'Holy Grail', for example, as an analogy for a desirable outcome. * Concepts from popular fiction sometimes become analogies too. "Golden Path" for example.

Not every analogy makes sense in every circumstance, but they are useful as a mutually understood shorthand to convey concepts.

A1kmm · 2025-01-08T21:36:30 1736372190

I'm not sure privacy violation is necessarily the right term to help people understand why long-term non repudiation is an undesirable property for some people.

It comes down to if a third party gets access to your emails (e.g. through a server compromise), should they be able to prove to a fourth party that the emails are legitimately yours, vs completely faked? Non repudiation through strong DKIM keys enables this.

Example: Third party is a ransomware gang who releases your emails because you didn't pay a ransom after your email server was compromised. Fourth party is a journalist who doesn't trust the ransomware gang, but also wants to publish juicy stories about your company if there is one, but doesn't want to risk their reputation / a defamation case if the ransomware gang just invented the emails.

tptacek · 2025-01-08T21:44:41 1736372681

Non-repudiation is virtually always undesirable in general-purpose messaging systems. Revealing to a stranger whether a message is valid is a concession to that stranger, not a benefit to the email's owner. This property is called "deniability" and most secure messaging systems go way out of their way to have it.

A1kmm · 2025-01-08T08:08:06 1736323686

It is better to ask your internal recruiter / HR department to inform the candidate of your feedback (if you work for a big enough company). It is also good practice to always have a panel, not just the hiring manager, doing interviews.

So the candidate gets feedback along the lines of: "Thank you for participating in our interview process. Unfortunately, our panel decided you weren't the best fit for position X at this time, because ...reasons.... Under company policy, we won't accept further applications from you for one year from today, but we would encourage you to apply for a role with us in the future".

There is a chance they will reply back to HR arguing, but it is their job to be polite but firm that the decision is already made, and that they can apply again in one year (and not pass anything back to the hiring manager).

The key is to think long term and about the company as a whole - the candidate who gets helpful feedback and is treated fairly is more likely to apply again in the future (after the mandatory cooling off period), when they might have more skills and experience working somewhere else. There is a finite qualified labour pool no matter where you are based, and having the good will even of rejected candidates is a competitive advantage. The message should be "not now", rather than "not ever" (although of course, if they do go on some kind of rampage, they could turn the not now into not ever - that's a bridge burning move). If a tiny percentage go on a rampage, but the company protects the individuals from it, and has lots of counteracting positive sentiment from prospective and actual staff, then it's still a net positive.

A1kmm · 2025-01-03T07:22:43 1735888963

I think it right to base the assessment of whether it is a walled garden on how easy it is for outsiders to access, and how easy it is to leave and take your community.

For viewing, I think you are doing well - your own domain name, which you can host where you like, and which currently doesn't impose many restrictions on who can view without signing up to anything.

But part of your community engagement is about having the community submit changes to you. And having that via GitHub is a walled garden - you can't make a PR without a GitHub account - or even search the code. And they say you are only allowed one free account - so one identity only - and I've heard credible reports they actively enforce it by IP matching etc..., and ban people if they suspect them of having two accounts.

Moving off GitHub isn't always that easy - you'd need to retrieve all your PRs, but then the problem is people who have GitHub accounts to engage with you would need to migrate their method of engagement.

So GitHub is absolutely a walled garden, and if you have a public GitHub, it is part of how you engage with your community.

Walled gardens do have the benefit of more people being in them - there is some barrier to entry to signing up on a random Gitea or Forgejo instance - but then you are beholden to the policies of the walled garden.

dend · 2025-01-03T07:43:31 1735890211

Fair point - I will add a note to the top that if you don't want to contribute via GitHub, you can send me a note to hi@den.dev. I will make the change myself.

yawpitch · 2025-01-03T08:16:30 1735892190

admiration++ for responsiveness in adding the email option.

kittikitti · 2025-01-03T13:32:10 1735911130

If you use GitHub the wrong way, that Microsoft is prescribing, then yes it's a walled garden. However, it's meant to simply be a git host.

7speter · 2025-01-03T09:28:54 1735896534

Wait you can only have one github account?

progval · 2025-01-03T09:32:18 1735896738

"One person or legal entity may maintain no more than one free Account (if you choose to control a machine account as well, that's fine, but it can only be used for running a machine)." https://docs.github.com/en/site-policy/github-terms/github-t...

A1kmm · 2024-12-29T01:37:28 1735436248

Also imagine you are a company with a reputation for hiring people - inducing them to leave their current job - and then often dismissing them quickly afterwards.

That would give many great prospective employees pause before applying to work there, because you are asking them to give up a good thing and take a chance on your company, without commitment.

Far better to screen early.

A1kmm · 2024-12-14T10:59:17 1734173957

Although the model weights themselves are also outputs of the training, and interestingly the companies that train models tend to claim model weights are copyrighted.

If a set of OpenAI model weights ever leak, it would be interesting to see if OpenAI tries to claim they are subject to copyright. Surely it would be a double standard if the outcome is distributing model weights is a copyright violation, but the outputs of model inference are not subject to copyright. If they can only have one of the two, the latter point might be more important to OpenAI than protecting leaked model weights.

A1kmm · 2024-12-14T10:33:27 1734172407

And you could even use SSS (Shamir's Secret Sharing - https://en.wikipedia.org/wiki/Shamir%27s_secret_sharing) to split the key to decrypt your confidential information across n people, such that some k (where k < n) of those people need to provide their share to get the key.

Then, for example, consider n = 5, k = 3 - if any 3 of 5 selected friends decide the trigger has been met, they can work together to decrypt the information. But a group of 2 of the 5 could not - reducing the chance of it leaking early if a key share is stolen / someone betrays or so on. It also reduces the chance of it not being released when it should, due someone refusing or being unable to act (in that case, up to 2 friends could be incapacitated, unwilling to follow the instructions, or whatever, and it could still be released).

withinboredom · 2024-12-14T10:39:19 1734172759

Then you just make those friends a target. They only need to buy-off or kill 3. It is unlikely the general public would know of them, so it likely wouldn’t be reported on.

crote · 2024-12-14T14:12:32 1734185552

Turn it around: require a 3/5 quorum to disarm the public-release deadman switch. Buying off 3 people whose friend you have just murdered isn't going to be trivial.

abduhl · 2024-12-14T20:38:42 1734208722

You think that people will be less motivated to do what they’re told after someone has proven a willingness to kill?

diggan · 2024-12-14T11:55:40 1734177340

I wonder if having some sort of public/semi-public organization of trading parts of SSS's could be done.

Right now, as an individual, you'd have pretty small number of trusted N's (from parents definition). With some organization, maybe you could get that number way up, so possibility of destroying the entire scheme could be close to impossible with rounding up large number of the population.

withinboredom · 2024-12-14T20:03:24 1734206604

This reminds me of an idea to create a "global programmer's union"

A1kmm · 2024-11-30T04:32:00 1732941120

It's likely they had no contractual agreement with the current owners of the inverters, and yet they have elected to wilfully damage the property of the current owners because they can.

Wilfully damaging someone else's property without permission of the current owner seems pretty malicious, regardless of whether the importers (or maybe someone who supplied to the importer) were in breach of a contract.