Hacker Newsnew | past | comments | ask | show | jobs | submit | sanketsaurav's commentslogin

FWIW Claude Code Opus 4.5 ranks ~71% accuracy on the OpenSSF CVE Benchmark that we ran against DeepSource (https://deepsource.com/benchmarks).

We have a different approach, in that we're using SAST as a fast first pass on the code (also helps ground the agent, more effective than just asking the model to "act like a security researcher"). Then, we're using pre-computer static analysis artifacts about the code (like data flow graphs, control flow graphs, dependency graphs, taint sources/sinks) as "data sources" accessible to the agent when the LLM review kicks in. As a result, we're seeing higher accuracy than others.

Haven't gotten access to this new feature yet, but when we do we'd update our benchmarks.


> $8/100k tokens strikes me as potentially a TON

It's $8/100K lines of code. Since we're using a mix of models across our main agent and sub-agents, this normalizes our cost.

> I could easily see hitting 10k+ LOC on routine tickets if this is being run on each checkpoint. I have some tickets that require moving some files around, am I being charged on LOC for those files? Deleted files? Newly created test files that have 1k+ lines?

We basically look at the files changed that need to be reviewed + the additional context that is required to make a decision for the review (which is cached internally, so you'd not be double-charged).

That said, we're of course open to revising the pricing based on feedback. But if it's helpful, when we ran the benchmarks on 165 pull requests [1], the cost was as follows:

- Autofix Bot: $21.24 - Claude Code: $48.86 - Cursor Bugbot: $40/mo (with a limit of 200 PRs per month)

We have several optimization ideas in mind, and we expect pricing to become more affordable in the future.

[1] https://github.com/ossf-cve-benchmark/ossf-cve-benchmark


Ah sorry, you were very clear on the pricing page and I meant 100k LoC, not tokens.

In your explanation here, you mention running it per PR - does this mean running it once? Several times?


We haven't included Gemini Code Assist or Gemini CLI's code review mode in our benchmarks[1] (we should do that), but functionally, it'll do the same thing as any other AI reviewer. Our differentiator is that since we're using static analysis for grounding, you'll see more issues with lower false positives.

We also do secrets detection out of the box, and OSS scanning is coming soon.

[1] https://autofix.bot/benchmarks/


> One of the main benefits of Semgrep is its unified DSL that works across all supported languages.

> People can disagree, but I'm not sure that tree-sitter S-expressions as an upgrade over a DSL.

100% agree — a DSL is a better user experience for sure. But this is a deliberate choice we made of not inventing a new DSL and using tree-sitter natively. We've directly addressed this and agree that the S-expressions are gnarly; but we're optimizing for a scenario that you wouldn't need to write this by hand anyway.

It's a trade-off. We don't want to spend time inventing a DSL and port every language's idiosyncrasies to that DSL — we'd rather improve our runtime and add support for things that other tools don't support, or support only on a paid tier (like cross-file analysis — which you can do on Globstar today).


That makes a lot of sense. I wish you the best of luck and will be happy to try it out as you continue to develop it!


Thanks!

> I'd love to hear how this project differs from Bearer, which is also written in Go and based on tree-sitter? https://github.com/Bearer/bearer

The primary difference is that we're optimizing for users to write their custom rules easily. We do plan to ship built-in checkers [1] so we cover at least OWASP Top 10 across all major programming languages. We're also truly open-source using the MIT license.

> Regardless, considering there is a large existing open-source collection of Semgrep rules, is there a way they can be adapted or transpiled to tree-sitter S-expressions so that they may be reused with Globstar?

I'm pretty sure there should be a way to make that work. We believe writing checkers (and having a long list of built-in checkers) will be a commodity in a world where AI can generate S-expressions (or tree-sitter node queries in Go) for any language with very high accuracy (which is where we have an advantage as compared to tools that use a custom DSL). To that extent, we're focused on improving the runtime itself so we can support complex use cases from our YAML and Go interfaces. If the community can help us port rules from other sources to our built-in checkers, we'd love that!

[1] https://github.com/DeepSourceCorp/globstar/pulls


Great release! What is the delta to achieve that porting using a trained approach?


Thanks! We still have a long way to go and a pretty extensive roadmap.

> Is there a general way to apply/remove/act on taint in Go checkers? I may not be digging deeply enough but it seems like the example just uses some `unsafeVars` map that is made with a magic `isUserInputSource` method. It's hard for me to immediately tell what the capabilities there are, I bet I'm missing a bit.

Assuming you're looking at the guide [1], the `isUserInputSource` is just a partial example and not a magic method (we probably should have used a better example there).

The AST for each node along with the context are exposed in the `analysis.Pass` object [2]. We don't have an example for taint analysis, but here's an example [3] of state tracking that can be used to achieve this. This is a little tedious at the moment and you'll have to do the heavy-lifting in the Go code — but this is on our roadmap to improve. We want to expose a lot more helpers to make doing things like taint analysis easily.

Here's another idea [4] we're exploring to make the YAML interface more powerful: adding support for utilities (like entropy calculation) that you can call and perform a comparison.

[1] https://globstar.dev/guides/writing-go-checker#_1-complex-pa...

[2] https://globstar.dev/reference/checker-go#analysis-function

[3] https://globstar.dev/reference/checker-go#state-tracking

[4] https://github.com/DeepSourceCorp/globstar/issues/27


Not at the moment, but we'll put something up soon.

We're focused on keeping globstar light-weight, so a hosted runtime is not in the roadmap (although we'll add support for running Globstar checkers natively on our commercial product DeepSource). You should be able to write any checkers in Globstar that you can write in the other tools you've listed.

Our goal is to make it very easy to write these checkers — so we'd be optimizing the runtime and our Go API for that.


Supporting C / C++ is in our roadmap. It needs some additional work to handle preprocessor directives [1] [2], which is why we didn't focus on it for the initial release.

[1] https://github.com/tree-sitter/tree-sitter-c/issues/13

[2] https://github.com/tree-sitter/tree-sitter-c/issues/108


Nice, subscribed.

I wonder how far you could get without solving #13, which does seem to be genuinely hard.


Not yet, but this is in our roadmap: https://github.com/DeepSourceCorp/globstar/issues/135

We're planning to implement a `skipcq` mute word.


Congrats on the launch! I tried this on a migration project I'm working on (which involves a lot of rote refactoring) and it worked very well. I think you've nailed the ergonomics for terminal-based operations on the codebase.

I've been using Zed editor as my primary workhorse, and I can see codebuff as a helper CLI when I need to work. I'm not sure if a CLI-only interface outside my editor is the right UX for me to generate/edit code — but this is perfect for refactors.


Amazing, glad it worked well for you! I main VSCode but tried Zed in my demo video and loved the smoothness of it.

Totally understand where you're coming from, I personally use it in a terminal tab (amongst many) in any IDE I'm using. But I've been surprised to see how different many developers' workflows are from one another. Some people use it in a dedicated terminal window, others have a vim-based setup, etc.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: