I couldn't find anything related MCP servers or tools that were offered to the agents. Wouldn't it be much more likely to succeed if there was e.g. a gdb server or an sqli/http server running for debugging purposes? That way the thinking process could succeed more easily, no?
Edit: Ah, the URL was wrong. It's cve-bench!
I couldn't find anything related MCP servers or tools that were offered to the agents. Wouldn't it be much more likely to succeed if there was e.g. a gdb server or an sqli/http server running for debugging purposes? That way the thinking process could succeed more easily, no?
[1] https://github.com/uiuc-kang-lab/cve-bench