On the OpenSSF CVE Benchmark[1], Semgrep CE hits 56.97% accuracy vs our 81.21%, and nearly 3x higher recall (75.61% vs 26.83%).
On when to run it, fair point. Autofix Bot is currently meant for local use (TUI, Claude Code plugin, MCP). We're integrating this pipeline into DeepSource[2], which will have inline comments in pull requests, that fits the QA/pre-merge flow you're describing.
That said, if you're using AI agents to write code, running it at checkpoints locally keeps feedback tight.
The structured vs open-ended distinction here applies to code review too. When you ask an LLM to "find issues in this code", it'll happily find something to say, even if the code is fine. And when there are actual security vulnerabilities, it often gets distracted by style nitpicks and misses the real issues.
Static analysis has the opposite problem - very structured, deterministic, but limited to predefined patterns and overwhelms you in false positives.
The sweet spot seems to be to give structure to what the LLM should look for, rather than letting it roam free on an open-ended "review this" prompt.
Jai here, from Autofix Bot team. We've published results of the initial benchmark run[1] comparing Gitleaks, detect-secrets and trufflehog ~3 weeks ago. In the meantime, we've put together a significantly improved dataset, and we're planning to rerun those benchmarks shortly; will include Kingfisher to the list, and share the results here.
Btw, we use Kingfisher's validation system internally for generating request/expected_response pairs for a given secret, as the last step of the pipeline. We don't run/call the validation queries ourselves, due to rate limit issues. But, we add this information in a structured format as part of the response which can be executed on the client side (or) by the user who is integrating via the API. Thanks for building it :)
DeepSource is a fast and reliable static analysis platform for developers and engineering teams. We've various roles open across Platform Engineering, Language Engineering and Marketing - https://careers.deepsource.io/
> There's a better solution: use open-source cli tools that do just that!
We do not deny that you can't run the open-source tools locally. Be it one line command, or be it setting up pylint or flake8 with dedicated configurations. DeepSource is a tool meant to eliminate the need to set up all those open source tools locally or in your CI pipeline. So that you don't need to
- Fish for issues amongst hundreds of lines of logs in the CI
- Figure out and update linter config to remove duplicates and false positives (for ex: Bandit throws errors like `assets statement used` in a test file — which is a false-positive. Bandit doesn’t know that it is a test file by default)
- Some issues needed better description of why is that an issue, for ex: why should default file permissions be 0600? Justification on why is it necessary,.
- By default on every commit or pull request, linters run on all the files.
- If there are issues that occur in say 50 places, one have to manually fix it.
Our focus at the moment is not on style issues. In fact, amongst the categories of issues we raise (anti-patterns, bug-risks, performance, security, style, documentation), style issues are the most debated on by our users as it is really subjective. We’re thinking of removing style issues by default (as an opt-in) and are working on running formatters like `black`, `yapf`, .. with a single line config in `.deepsource.toml`. Our analyzer team actively adds custom rules which you don’t get from the open-source tools. The following issues for example:
- Raising another exception when `assert` fails is ineffective. For ex: `assert isinstance(num_channels, int), ValueError('Number of image channels needs to be an integer')`
- If the condition would not be satisfied, user would be expecting a `ValueError`, but this would be raised: `AssertionError: Number of image channels needs to be an integer` which should be
- `yield` used inside a comprehension (which breaks code in Python 3.8)
- Write operation on file that is opened in read-only mode
- I/O detected on a closed file descriptor
> 2. Type checking? Use `mypy`: it just a single command!
Sure. If one prefers running it locally (or) as part of their CI. But if you already use DeepSource to flag issues, it can be enabled by a single line in .deepsource.toml file.
> 3. Autofixing? Use `black` / `autopep8` / `autoflake` and you can use `pybetter` to have the same ~15 auto-fix rules. But, it is completely free and open-source
We are working on adding support for autopep8, black and autoflake in coming weeks. They mostly auto-patch stylistic issues [1]. Thanks for letting us know about pybetter. It looks like a great tool and fixes ~9 issues [2]. DeepSource’s autofix aim is to fix more than 3/4th of issues we detect and we detect 522 issues in our Python analyzer. We have dedicated engineering team actively working on the analyzers. As of today, following are some of the issues our Python analyzer can autofix (which I couldn’t find it among the open-source tools):
- No use of `self`
- Usafe of dangerous default argument
- Module imported but unused
- Function contains unused argument
- Debugger import detected
- Debugger activation detected
- Unnecessary comprehension
- Unnecessary literal
- Unnecessary call
- Unnecessary typecast
- Bad comparison test
- Empty module
- Built-in function `len` used as condition
- Unnecessary `fstring`
- `raise NotImplemented` should be `raise NotImplementedError`
- `assert` statement used outside of tests
Same goes with Go and other analyzers we support.
> I don't like this whole idea of such tools (both technically and ethically):
> Why would anyone want to send all their codebase to 3rd party? We used to call it a security breach back in the days.
We follow strict security practices [3]. In a gist, 1) We do not store your code, 2) Source code is pulled in an isolated environment that has no access to any of our internal systems or the external network, 3) As soon as the analysis is completed, the environment is destroyed and all logs are purged. Also, there are many tools that developers use everyday (Travis CI, Circle CI, GitHub) where the source code is sent to the cloud — I don't think it is accurate to call it a security breach. That said, we have on-premise setup of DeepSource in the roadmap. We’re working on SOC 2 Type 2 compliance as well [4].
> On moral side, this (and similar) projects look like thin wrappers around open-source tools but with a monetisation model. How much do these companies contribute back to the original authors of pylint, mypy, flake8? Ones who created and maintained them for years. I will be happy to be wrong here
We have kept the tool completely free to use for open-source projects. We’ve also partnered with GitHub Education and made it free for students. We’re an early stage company trying to build a business in automating objective parts of code review and making it easier for every developer to adopt and use static analysis. With all transparency, we had plans to sponsor open-source projects but got sidetracked due to various reasons. We will be backing some of the open-source projects, in next couple of weeks.
DeepSource integrates with GitHub checks [1] and via the dashboard, you can select the issue types (anti-patterns, bug risks, performance and security issues, style, type checks and documentation), which when detected, will cause analysis runs to fail and pull requests to be blocked.
[1] https://deepsource.com/directory