N-Day-Bench – Can LLMs find real vulnerabilities in real codebases?

N-Day-Bench tests LLMs' ability to find known security vulnerabilities in real codebases. It uses a monthly refresh to keep the test set ahead of contamination. Five LLMs are currently being evaluated. The results are publicly available. Engineers can view the methodology, leaderboard, and traces on the N-Day-Bench website.

Source →
FeedLens — Signal over noise Last 7 days