Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

GitHub PM here. Glad that was a good experience! We work with ~50 partners (details in the link below) to notify them when tokens for their service are exposed in public repos, so that they can notify you.

https://docs.github.com/en/code-security/secret-scanning/sec...



TIL: make private key for your service easy to match with regexps


Reminds me of how Airbnb redacts Hawaiian street addresses because they look too much like phone numbers, literally replacing them with a "phone number hidden" string in the host|guest chat.

Moral of the story: make your keys regexable without likelihood of false positives!


I spend a lot of time working with physician data. In the USA, physicians have a registration system called NPI. Apparently, NPI numbers are in the same format as some passport numbers. I know this because I started getting angry warnings about PII sharing until I got our tech team to turn them off.


The whole industry should adopt a convention to prefix production keys with a well known prefix, such as "prod_secret_".

We should have our systems and precommit hooks then alert us when those enter places they shouldn't and help us automate rotation.


You're not the first with this idea! There exists a standard, see RFC 8959: https://www.rfc-editor.org/rfc/rfc8959.html

Previous HN discussion: https://news.ycombinator.com/item?id=25978185


Bad idea. Better do it in DEV like you would do in PROD, not to shoot yourself in the foot. If you do it right in DEV, no problem in PROD.

And what if your DEV is not actually well isolated from PROD/other infra? And what if some real data sneaked into DEV? Etc.


I think prod_ might not be the important part there, so something like __secret__ should be enough.


Yeah, prefixing your keys with your service name like SRVCE_{KEY} is the way to go.

Bonus: adding SRVCE_PRVT_{KEY} and SRVCE_PUB_{KEY}.


And while we're at it, I think saving two chars isn't going to do much to prevent global warming, and let's just use more readable SERVICE_{KEY} and SERVICE_PUB_{KEY} (as opposed to having scratch your head thinking "did I call it SRV, SVC, SRVC, SRVCE, ...?")


I think OP meant to use the actual name of the service. For example FOOBARINC_{KEY}

I also think that it should look just a bit cryptic to make a person unsure if they can meddle with the string.


I meant as an abbreviation, like GitHub becomes ghp_XXXXXXXXX. But yeah anything is better than just random characters.


There is already RTC 8958 Secret token scheme for this, so you do not need to invent your own prefix

https://datatracker.ietf.org/doc/html/rfc8959


I see this standard linked here a lot. Did anyone read it though? It only helps with identifying whether a string is a secret, not at all the service or environment where the secret applies.


If any value does not natively support secret token sceme, you can apply secret-token: prefix and then strip it during the usage.


I don't think it's a regex pattern, most keys are random strings.


That’s the idea. Add a deterministic prefix to make it identifiable as associated with a specific service.


Awesome feature. Saved the day for us some months back when an AWS token was accidentally committed and pushed. (AWS itself also immediately notified us.)


Rant time: this isn’t directed at you. I am just replying to your comment because you said something that triggered me.

Also the “you” below is the generic you - not you personally.

Disclaimer: I work at AWS in Professional Services, all rants are my own.

Now with that out of the way, I hate the fact that there are way too many code samples floating around on the internet that have you explicitly put your access key and secret key in the initialization code for the AWS SDK.

    s3 = boto3.resource(‘s3’,aws_accesskey_id=ccxx,aws_secret_access_key_id=cccc)
Even if you put the access keys in a separate config file in your repo, this is wrong, unnecessary, and can easily lead to checking credentials in.

When all they have to do is

s3=boto3.resource(‘s3’)

All of the SDKs will automatically find your credentials locally in your .config file that is in your home directory when you run “aws configure”.

But really, you shouldn’t do that, you should use temporary access keys.

When you do get ready to run on AWS, the SDK will automatically get the credentials from the attached role.

Even when I’m integrating AWS with Azure DevOps, Microsoft provides a separate secure store that you can attach to your pipeline for your AWS credentials.


Hindsight is 20/20, but definitely one of those places where flat out giving the credentials should not even be an option (or it should be made artificially tedious and/or explicitly clear that it’s a bad idea by e.g. naming the param _this_is_a_bad_idea_use_credentials_file_instead_secret_key or so). Of course there are always edge cases in the vein of running notebooks in containers (probably not an optimal example, but some edge case like that) where you might need the escape hatch of embedding the credentials straight to the code.

But yeah, if the wrong thing is easier or more straightforward than the right way, people tend to follow it when they have a deadline to meet. To end on a positive note, at least cli v2 makes bootstrapping the credentials to a workstation a tad easier!


I remember a Rust AWS library worked like you describe (An old version of rusoto, I think, deprecated now).

I wasn't familiar with how AWS credentials are usually managed so I was very confused why I had to make my own struct and implement the `CredentialSource` trait on it. It felt like I was missing something... because I was. You're not supposed to enter the credentials directly, you're supposed to use the built-in EnvCredentialSource or whatever.


> at least cli v2 makes bootstrapping the credentials to a workstation a tad easier!

I know I should know this seeing that I work in ProServe at AWS, but what do you mean?

I’m going to say there is never a use case for embedding credentials just so I can invoke Cunningham’s Law on purpose.

But when I need to test something in Docker locally I do

    docker run -e AWS_ACCESS_KEY_ID=<your_access_key> -e AWS_SECRET_ACCESS_KEY=<your_secret_key> -e AWS_DEFAULT_REGION=<aws_region> <docker_image_name>
And since you should be using temporary access keys anyway that you can copy and paste from your standard Control Tower interface, it’s easy to pass those environment variables to your container.


I meant the aws configure import which they added — point it to the credentials csv and the cli handles adding the entry to the credentials file.

Sometimes you might need to use stuff that for some reason fails to use the envars, I think I’ve bumped into some stuff which reads s3 via self-rolled http calls. Dunno if it was to save from having boto as a dependency, but those things are usually straightforwardly engineered so no logic in figuring out the other, more smart ways to handle the keys. Here are the parameter slots, enter keys to continue.


> I hate the fact that there are way too many code samples floating around on the internet that have you explicitly put your access key and secret key in the initialization code for the AWS SDK.

See, I thought that was a big strength of a lot of the AWS documentation over Google Cloud.

An AWS example for, say, S3 would show you where to insert the secrets, and it would work.

The Google Cloud Storage examples, though? It didn't seem to have occurred to them that someone reading "how to create bucket example" might not have their credentials set up.

And when the example didn't work - well, it was like the auth documentation was written by a completely different team, and they'd never considered a developer might simply want to access their own account. Instead the documentation was a mess of complicated-ass use cases like your users granting your application access to their google account; sign-in-with-google for your mobile app; and so on.

Google's documentation is better than it once was - but I've always wondered how much of the dominance of AWS arose from the fact their example code actually worked.


> See, I thought that was a big strength of a lot of the AWS documentation over Google Cloud.

Just to clarify, I’ve never seen a code sample published by AWS that has you explicitly specifying your credentials. (Now I await 15 replies showing me samples hosted on Amazon)


For Java they used to demonstrate putting a .properties file in among your source code [1] although admittedly not literally hardcoding a string. The PHP examples suggested putting your code into a config php include [2] (although they did also suggest putting them in your home directory).

But I can't understate how important it was that the AWS getting started guides said "Go to this URL, copy these values into this file" while Google's examples and getting started guides... didn't.

[1] https://web.archive.org/web/20120521060506/http://aws.amazon... [2] https://github.com/amazonwebservices/aws-sdk-for-php/blob/ma...


Wow, that’s some old code :).

But here is the newest documentation for PHP

https://docs.aws.amazon.com/sdk-for-php/v3/developer-guide/s...


I deal with this by having a directory in my development tree, named ”doNotCheckThisIntoSourceControl”, and I add a wildcard of it to my global .gitignore.

I’ll put things like server secrets and whatnot, there.

Of course, I need to make sure the local directory is backed up, on this end, since it is not stored in git.

Works a treat.


That’s really not a great idea…


...and why?

I am serious. If there is a better way, I'd use it.

Remember that I don't do online/server-based stuff. Most of my projects are for full compilation/linking, and rendering into host-executable, binary apps. There's a bunch of stuff in my development process that never needs to see a server.


A super simple way is to have a script in your home directory - far away from your repos - that set environment variables that you read in your configuration.


That makes sense. I could do something like that.

[UPDATE] I ended up doing something even simpler. I have issues with running scripts during the build process, unless really necessary (I have done it, and will, again).

Since this is Xcode, I simply needed to store the file in a directory (still with the global ignored name) far out of my dev tree, and dragged the file into the IDE.


That’s basically the idea - get your credentials out of your dev tree. I’m not dogmatic about how it’s done.


Try AWS Secrets


100% agree. We always keep all tokens (not just AWS secret keys) in a separate file that is never checked into the repo and are passed into the CloudFormation template at deployment. (The error in this case was a new repo hastily pushed and .gitignore wasn't properly updated to exclude the file with the keys.) But we've since switched to using AWS Secrets which is a much better solution.


Yeah that’s not good either. Your keys never need to be in a local file. Just put them in Parameter Store/Secrets Manager and you can reference those values in CF.


Yeah, that's what we do now


Yeah I just learned the role-based access approach last year. No keys ever hit the box so there's nothing for attackers to exfiltrate.


I wish I could set this up to block pushes proactively instead of reacting to pushed secrets.



Yelp has a "detect-secrets" project that can detect potential secrets and can be used as a pre-commit hook: https://github.com/Yelp/detect-secrets


You could set up something like https://github.com/godaddy/tartufo in a pre-commit hook. Not sure if github has a way to hook into the push hooks on server side, they might though.


Yeah, the issue with pre-commit hooks is you have to remember to set them up client-side. I tend to push to GitHub through a gitolite mirror, though, so I could probably put this in the hooks in my gitolite middlebox.


What do you have to set up client side? They can be committed with the project. Or do I misunderstand?


Pre-commit hooks can't be automatically set up on the client side. If they could, this would mean that any repo you clone could run arbitrary code on your machine.

It can be as simple as a script you have to run once, but it can't be automatic. Which also means you can't really trust contributors to do it, even if they're well-meaning some will forget.




Top proactive security feature of the year, for me. Nice stuff.


Is this really expensive? We're a small startup providing API keys, to our customers.


It's totally free - there are details of how to join the program at https://docs.github.com/en/developers/overview/secret-scanni...


Hm - this would work better if keys were easy to scan with regular expressions.

Next time I implement api keys I wonder if it’s worth going out of my way to make them easy to identify. Eg, by prefixing every key with a few well known characters. Like FMLA_xxxxx for a fastmail app key.


That's exactly what GitHub did with their own keys and their new keys fit this format.

https://github.blog/2021-04-05-behind-githubs-new-authentica...


I just implemented our API with a PREFIX_KEY so our self-hosted customers can change it they want to.

We will be applying thanks for sharing greystell


If you go make an API key in Fastmail (Settings -> Password & Security -> API tokens), you'll see that it's prefixed very similarly to that (e.g. `fmo1-`) for this very reason! (There are some other neat things about our API key format I'd be happy to tell you about sometime if you're interested.)


Hah I just used that as the first example which came to mind.

Yeah absolutely - I'd love to hear about it!


Some services also use prefixes to provide additional context like account type and token validity length. I think Slack does this (service accounts have different prefixes than user accounts and I think temporary tokens have another prefix)


It's absolutely worth it, for everyone's sanity involved.


I would imagine the main expense for you will be your implementation cost:

https://docs.github.com/en/developers/overview/secret-scanni...

Your dev team would probably be able to give that a glance and estimate the work.


I had a couple questions, as this feature is awesome!

How long does it take to get the response vs external bots pulling the data? What mechanisms does GitHub have in place to stop bots who monitor repo changes? I ask, as I have been there and it is super scary how fast someone/bot pulls repo data changes, as in minutes, and the repo we had back then was not popular.


As long as search results can be sorted by date, anyone can see updates pretty much instantly if they monitor the search results. The repos don't have to be popular for that. Bots can just check such a feed every few seconds for example.

https://github.com/search?o=desc&q=secret&s=updated&type=Rep...


What are the thoughts around capabilities like this for private/enterprise customers? Is the code available in an action that could be connected to private runners perhaps?


Would blocking commits containing such tokens/keys be a better option?


You can definitely use pre commit hooks for this like the one of ggshield https://github.com/GitGuardian/ggshield - remediation is far quicker when the secret does't make it to the codebase!


You cannot block what someone commits (they can block it themselves with tools like gitleaks invoked on a pre-commit hook) so the only thing you can do as a 3rd party is to scan and react when you do notice a secret published.


GitHub certainly could block push requests, at least git itself can via hooks, there are a number of hooks invoked by git-receive-pack that can influence what it does.


> GitHub certainly could block push requests

But the commit still exists locally (since git is decentralized) so you now end up with a weird state that you have code you cannot push to origin. Definitely not a desirable feature.

> at least git itself can via hooks

I already said that:

> they can block it themselves with tools like gitleaks invoked on a pre-commit hook

The problem with git hooks is that they're not cloned with the repo. So you're reliant on the user installing those git hooks locally (sure, some repos will have helper scripts to install the hooks for you. But you're still reliant on the user running that script).


> code you cannot push to origin. Definitely not a desirable feature.

If there is data that should never be pushed to origin, then it is a highly desirable feature that the server block pushes that include that private data.

> The problem with git hooks

I was talking about GitHub's own git hooks that run on their servers, not about any local ones.

> is that they're not cloned with the repo.

It would be a terrible security issue if they were automatically enabled after cloning.


> If there is data that should never be pushed to origin, then it is a highly desirable feature that the server block pushes that include that private data.

It’s already too late by that point because your secrets have already left the building. You’re not relying on upstream being honourable

> I was talking about GitHub's own git hooks that run on their servers, not about any local ones.

There’s no such thing. You can have CI tooling like GitHub Actions, but they’re a different beast to git hooks

> It would be a terrible security issue if they were automatically enabled after cloning.

It doesn’t have to be either/or. There are ways of having a sensible compromise. Like a git config that enables hooks from known safe origins. Or having the user promoted whether they want to install git hooks upon cloning.


> It’s already too late by that point

True, but it is better than the secrets becoming entirely public, automated bots could be harvesting them and exploiting the resources they protect.

> There’s no such thing.

I would be surprised to here that GitHub doesn't actually run git on their servers. If they receive git pushes using git, then own git hooks are involved, ones that GitHub has written for their own purposes. They could simply add one to block bad pushes.

> Like a git config that enables hooks from known safe origins.

That sounds a bit terrifying to me, but I'm not of the GitHub generation.

> Or having the user promoted whether they want to install git hooks upon cloning.

That sounds like it would enable phishing-like attacks and people just clicking "yeah sure" without verifying the safety of the hook.


> True, but it is better than the secrets becoming entirely public, automated bots could be harvesting them and exploiting the resources they protect.

True. And some popular repos do already run into this problem. So it’s not a theoretical problem.

> I would be surprised to here that GitHub doesn't actually run git on their servers.

They’ve documented about how their backend works so there’s no need to speculate. They run an implementation of git but not the standard git CLI.

> If they receive git pushes using git, then own git hooks are involved, ones that GitHub has written for their own purposes. They could simply add one to block bad pushes.

They have their automation, GitHub Actions.

Sure they “could” also implement what you’ve described but it’s not how it currently works. So a pointless argument since we could be here all year discussing the literal infinity of different things Github “could” do in theory but that their infrastructure doesn’t currently support.

> That sounds a bit terrifying to me, but I'm not of the GitHub generation.

What I posted has literally nothing to do with GitHub. In fact if your origin is private git server (as I started out using git, since GitHub didn’t exist back then) then it’s even easier to designate a trusted origin. This approach makes total sense for businesses. Doesn’t work so well for open source but it’s just one option of many.

> That sounds like it would enable phishing-like attacks and people just clicking "yeah sure" without verifying the safety of the hook.

Potentially yes. But if you’re cloning a git repo, making code changes and then committing it back, you’d hope that individual is competent enough to audit the git hook. At the very least, they’ll be running the build scripts locally to unit test their changes, so it’s not like that phishing attack isn’t already present. Feels very much like you’re looking for reasons to dismiss any suggestions here rather than have an intelligent discussion.


This is what I do[0]. Low-tech and un-sexy, but WFM. YMMV.

[0] https://news.ycombinator.com/item?id=33201467



That's attention to detail right there. Very nice.


This is awesome!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: