StackHawk logo featuring a stylized hawk icon on the left and STACKHAWK in bold, uppercase letters to the right. The white text and icon on a light gray background reflect its focus on Shift-Left Security in CI/CD.

Claude Code Security Evolution: Where It Came From, Where It’s Going

A young man with short hair smiles widely. The image is in black and white and framed by a light blue hexagon, representing a focus on Shift-Left Security in CI/CD practices. Matt Tanner   |   Apr 14, 2026

Share on LinkedIn
Share on X
Share on Facebook
Share on Reddit
Send us an email
A dark banner features the claude code security white starburst logo and text, set against a gradient background with faint hexagonal outlines.

When Anthropic rolled out Claude Code Security on February 20, 2026, cybersecurity stocks dropped, and the LinkedIn and X echo chambers swung into full gear: AppSec is dead. No, it isn’t. Yes, it is. 

Since then, the debate moved IRL at RSAC, some interesting benchmarks have been published, and we’ve all taken a step back to ask: how did we get here, what is the immediate impact on AppSec teams, and how will vendors respond in the long-run. 

In this post, we’re reviewing the (albeit short) history of Claude Code Security, starting with the /security-review command that’s been available since August 6, 2025, rounding up some of the community’s reactions, and sharing our own perspective on what it means for AppSec at large.

Claude Code Security vs. /security-review

What’s the difference?

These are two different tools with different scopes, and neither one covers what runtime security testing does. We’ve been tracking both closely at StackHawk. Our co-founder, Scott Gerlach, wrote about what Claude Code Security means for AppSec on the day it launched. Since then, we’ve dug through the GitHub Actions source code for the original /security-review command, read the full announcement on the Claude Code Security release, and followed what security practitioners have reported across various forums and publications. Here’s what we’ve found.

The /security-review command was introduced in August 2025 as a built-in feature for Claude Code users on paid plans. You open Claude Code in your project directory, type /security-review like any other slash command in the platform, and Claude scans your codebase for common vulnerability patterns.

It checks for the basics: SQL injection, cross-site scripting, authentication and authorization flaws, insecure data handling, and dependency vulnerabilities. Think of it as a first-pass static scan built into your AI coding assistant.

The command also has a companion GitHub Action that triggers automatically on pull requests, posting inline comments with findings and recommendations. You can customize both by copying the security-review.md file to your project’s .claude/commands/ folder and editing it with your team’s specific scanning requirements.

Most industry experts settled with the fact that it wasn’t equipped to catch more complex issues and should be treated as a companion to human reviews and security testing rather than a replacement. 

We tend to agree.

The /security-review command is useful, but it’s explicitly positioned as a complement to existing security practices, not a replacement for manual review or broader AppSec controls.

Enter Claude Code Security

The next iteration was Claude Code Security, the latest release, which takes a much deeper approach to security scans within the Claude platform. Launched six months after the /security-review command, as of writing, it’s available only as a limited research preview for Enterprise and Team customers (with expedited access for open-source maintainers).

Where /security-review matches patterns, Claude Code Security “reasons” about code. Anthropic describes it as reading code “the way a human security researcher would: understanding how components interact, tracing how data moves through your application, and catching complex vulnerabilities that rule-based tools miss.”

The product includes a dedicated dashboard where security teams can review findings, examine suggested patches, and approve or reject fixes. Each vulnerability goes through what Anthropic calls a “multi-stage verification process.” Claude identifies an issue, then re-examines its own finding adversarially, attempting to prove or disprove it before surfacing it to analysts. The result is a confidence rating for each finding and a severity rating.

The headline number that drove the stock panic: using Claude Opus 4.6, Anthropic’s team found and validated more than 500 high-severity vulnerabilities in production open-source codebases. Bugs that had survived decades of expert review and extensive fuzzing. In one case, Claude discovered a buffer overflow in the CGIF library by reasoning about the LZW compression algorithm, something coverage-guided fuzzing couldn’t catch even at 100% line and branch coverage.

That’s a fundamentally different capability than what /security-review offers. The /security-review workflow is primarily positioned as a baseline review for common vulnerability patterns. Claude Code Security attempts to find unknown vulnerabilities through AI reasoning.

The Community Response From Across The Web 

The initial Wall Street panic has faded, and the community’s response has been more nuanced than the stock market’s.

On the upside, Isaac Evans, CEO of Semgrep, told The Register he’s “very excited for Claude Code Security, even though we haven’t tried it yet.” Randall Degges, writing on the Snyk blog, noted that Anthropic’s announcement “gets right” that “AI belongs in the remediation loop, not just the detection loop.” The ability to not just detect but suggest context-aware fixes is a real step forward.

On the false positive question, Evans raised a practical point that no one from Anthropic has addressed publicly: “So far none of the foundation model companies… have published detailed statistics on how many false positives they experienced to get the results they had, or the cost to do so.” He also noted that he and his colleagues are “hearing reports from security researcher friends that of the 500 vulnerabilities, not all of them are truly ‘high-severity’ as described.”

On non-determinism, Danny Allan, CTO at Snyk, raised what might be the most fundamental concern for security teams. Writing in Dark Reading’s August 2025 coverage of the /security-review command, Allan warned: “I also worry about nondeterministic mechanisms being your deterministic guardrails. And by definition, these are nondeterministic.” That concern applies even more to the full Claude Code Security product. For compliance-driven security programs that need repeatable, auditable results, this is a real limitation. You can run the same scan twice and get different findings.

On trusting AI to secure AI-generated code, Allan also noted a deeper tension in the same piece: “You should not trust what is generating the code to secure the code. Because if you hallucinated one time, you’re likely to hallucinate again.” The multi-stage verification process, where Claude validates its own findings, is creative, but you’re relying on the same model to catch its own blind spots.

On the prompt injection risk, the GitHub Action’s README includes a warning that the tool is “not hardened against prompt injection attacks” and should only review trusted PRs. And in a bit of irony, Check Point Research disclosed two vulnerabilities in Claude Code itself in early 2026: CVE-2025-59536 (CVSS 8.7), a code injection flaw, and CVE-2026-21852 (CVSS 5.3), an information disclosure vulnerability that could exfiltrate API keys. Anthropic patched both.

The Register summed up the broader community sentiment well: the reality “isn’t nearly as gloomy for the security industry, nor as exciting and sexy as AI evangelists make it out to be.”

The Glaring Gap: Neither Tests in Runtime

Here’s the critical point that gets buried in the Claude Code Security hype. Both the /security-review command and the full Claude Code Security product share a fundamental limitation: neither one runs your application.

Both tools analyze source code. One matches patterns. The other reasons about logic. But entire categories of vulnerabilities only surface when an application is actually running, processing real requests, and interacting with real infrastructure.

Authorization flaws like BOLA (Broken Object-Level Authorization) and BFLA (Broken Function-Level Authorization) require sending real requests through your API stack to detect. An AI model can read your authorization code and theorize about where it might fail. Runtime application security testing proves whether it actually does.

Consider a concrete example. Your API has an endpoint that returns user data. The code looks like it checks authorization properly. But in your running environment, with your specific database configuration and middleware stack, a request with a modified user ID bypasses the check and returns another user’s data. No source code analysis, whether pattern-based or AI-powered, catches that. You only find it by sending the request.

The same applies to business logic flaws. Can a coupon code be applied twice? Does the checkout flow enforce the correct sequence of steps? Does your rate limiting actually work under concurrent requests? These are runtime behaviors that exist in the interaction between your code, your infrastructure, and real HTTP traffic.

Anthropic’s own announcement positions Claude Code Security as a code analysis tool, not a replacement for dynamic testing. It scans source code. It doesn’t test running applications.

Runtime testing catches a different class of vulnerabilities by design. DAST tools like StackHawk test for the OWASP Top 10 and OWASP API Top 10 by executing real attack scenarios against your running application. SQL injection isn’t flagged as a code pattern. It’s confirmed by sending a crafted payload and observing whether the database responds in a way it shouldn’t.

Authorization testing makes the difference clearest. Claude Code Security can read your authorization logic and flag patterns that look suspicious. Runtime testing sends authenticated requests, manipulates tokens and IDs, and confirms whether your authorization holds up under real conditions. The BOLA and BFLA vulnerabilities that dominate API breach reports are runtime problems, and they require runtime testing to validate.

And unlike AI-based scanning, DAST results are deterministic. The same test against the same running application produces the same result. For compliance and audit purposes, that predictability is a requirement, not a nice-to-have.

So, Where Does That Leave Us?

The launch of Claude Code Security is a genuine milestone for AppSec. AI reasoning about code at the level Anthropic is demonstrating does catch things that traditional tools miss. Pieter Danhieux, CEO of Secure Code Warrior, saw this coming. Back when the /security-review command launched, he told Dark Reading: “I think a year ago everyone knew that static test tools would be dead eventually… It happened faster than I thought it would.” Claude Code Security is the next step in that trajectory.

The OWASP Top 10 API vulnerabilities that drive real breaches, the authorization flaws and business logic bugs that attackers actually exploit, live in the runtime behavior of your application. You confirm them by testing the running application, not by reading the source.

The teams that will ship the most secure software are the ones that layer all three approaches. Use /security-review as your daily driver. Use Claude Code Security for deeper AI-powered analysis when you have access. And use runtime DAST to confirm your application actually behaves securely under real conditions.If you want to see how runtime testing fits alongside AI code analysis in your pipeline, schedule a demo with StackHawk.

More Hawksome Posts

StackHawk vs. XBOW

StackHawk vs. XBOW

AI pentesting and shift-left DAST both test running applications for vulnerabilities. Here’s what separates them.