StackHawk
๏ƒ‰
M

Get A Demo

Name(Required)
Email(Required)

Introducing StackHawkโ€™s LLM Security Testing: Find LLM Risks Pre-Production

Scott Gerlach   |   Nov 13, 2025

Share on LinkedIn
Share on X
Share on Facebook
Share on Reddit
Send us an email
TL;DR: StackHawk now detects five critical LLM security risks from the OWASP LLM Top 10โ€”Prompt Injection, Sensitive Data Disclosure, Improper Output Handling, System Prompt Leakage, and Unbound Consumptionโ€”natively as part of our shift-left runtime testing.

AI isn’t just changing the pace and volume at which applications are builtโ€”it’s changing the very nature of them. AI coding assistants generate complete APIs in minutes. Developers embed LLM capabilitiesโ€”RAG systems, prompt chains, vector databasesโ€”directly into applications without traditional security review. Each integration creates attack vectors that didn’t exist two years ago: prompt injection, data leakage through context windows, and unvalidated outputs used in business logic.

Applications Are Becoming LLM-Native; AppSec Should Too

A slew of tools have cropped up to find LLM risks: AI firewalls, AI red teaming, and cross-function AI security platforms. These purpose-built solutions can help, but they risk repeating the same mistakes that plagued traditional AppSecโ€”tool sprawl, disjointed findings, and security testing that happens too late in separate environments developers never see.

The challenge? LLMs aren’t bolt-on featuresโ€”they’re deeply embedded into applications where logic, APIs, data stores, and LLM components operate as a single attack surface. Interfaces connect to vector databases, pull customer records, construct prompts with sensitive data, and return LLM-generated responses that flow directly into application logic.

The security implications are clear (and imminent):

  • LLM components are application components now. If your app has an LLM integration, attackers will target it like any other endpoint
  • Traditional AppSec testing wasn’t built for LLM risks. SAST won’t catch prompt injection. Legacy DAST doesn’t understand LLM behavior patterns
  • The attack surface expanded while you were reading this. Developers are shipping LLM features faster than security teams can inventory them

At StackHawk, we believe finding LLM risks should be native to AppSec toolsโ€”integrated into developer workflows from the start, not bolted on as another platform to manage after code ships.

The LLM Security Risks StackHawk Now Detects

We’re excited to introduce five new plugins as part of StackHawk’s core runtime testing engine that detect critical LLM security risks from the OWASP LLM Top 10:

  • LLM01: Prompt Injection – Detects when attackers can manipulate prompts to override system instructions, bypass safety controls, or trick your LLM into revealing other customers’ data or executing unauthorized actions.
  • LLM06: Sensitive Data Disclosure – Identifies when LLMs leak customer PII, API keys, internal system details, or proprietary business logic through responses to carefully crafted prompts.
  • LLM02: Improper Output Handling – Catches vulnerabilities where unvalidated LLM outputs get used in SQL queries, system commands, or API callsโ€”turning the LLM into an injection attack vector.
  • LLM09: System Prompt Leakage – Finds when attackers can extract your system instructions, hidden prompts, or internal configuration, giving them a roadmap for more sophisticated attacks.
  • LLM04: Unbound Consumption – Detects missing rate limits or resource controls that let attackers rack up thousands in API costs or create denial-of-service conditions.

The best part? These tests run in the same place as your existing StackHawk scansโ€”directly in developer workflows and environments, before production. Results show up wherever developers work with reproduction steps and remediation guidance delivered directly to them. By flagging LLM risks earlier, you’re not just reducing risk earlier; you’re helping teach developers best practices for building secure LLM-integrated applications.

Why Runtime Testing Is Key for LLM Security

LLM security risks only really exist at runtime. You can’t find prompt injection by reading source codeโ€”you have to test how the application behaves when an attacker manipulates prompts, how the LLM responds, and whether the application properly handles that response.

StackHawk was built for this. We’ve been doing runtime application security testing in CI/CD for years, using configuration-as-code to test actual application behaviorโ€”real requests, real responses, real security controls. This isn’t a bolt-on. It’s what our testing engine was designed to do: find runtime risks in modern application architectures before they reach production, and give developers what they need to actually fix them.

Reimagining AppSec for the LLM Era

This addition is part of our broader StackHawk vision: reimagining application security for AI-driven development.

Applications are built faster than traditional AppSec tools can keep up. Attack surfaces expand in real-time as developers integrate LLM capabilities. Security teams need visibility into what’s being built (including LLM/AI components), runtime testing that matches development velocity (including coverage for LLM risks), and intelligence to prove their programs are actually reducing risk.

Ready to test your apps for LLM security risks? Talk to our team about testing your applications against LLM security risksโ€”before they’re deployed.

More Hawksome Posts