StackHawk
Hamburger Icon

Customized and Configurable
Scan Discovery

sam-volin@2x-1-ow5g5gpull4tl2eh13ghu9umacjofjckzohzc1m3k0

Sam Volin|March 4, 2024

Learn how you can help StackHawk effectively navigate and test the paths of your web application in many different ways to determine which method or combination of methods is best for you.

HawkScan provides multiple mechanisms to discover running web applications. Security and software development teams can combine forces and accomplish more in their software development pipeline by using the Spidering, HAR file, Seed Path, or Custom Scan Discovery mechanisms.

Discovering an application by spidering pages

During the Active Scan portion of HawkScan's operation, it will actively attack and attempt to replicate known software vulnerabilities against any paths your application exposes. Understanding what endpoints your web application exposes is fundamental to how HawkScan operates.

After HawkScan has started and configured behavior, but before the Active Scan, it will begin the Scan Discovery phase, finding the paths of your web application by "spidering" them. This process will follow the URLs and relative paths found on each application/HTML page in a breadth-first search(BFS) pattern. Starting from the scanned `app.host`, this pattern will look for URLs within the same origin of your host (read: the application you’re scanning), and perform a Passive Scan on the response, checking for any direct evidence of known vulnerabilities on a separate thread, before adding the path to the site tree for reuse during the Active Scan. 

This behavior is what happens by default when you run HawkScan. HawkScan caps spidering at 2 minutes by default, so this part of the scan won't take forever like some other AppSec tools. HawkScan also supports scanning from an OpenAPI specification file, Or a GraphQL introspection endpoint, or even a Soap WSDL file to find attackable paths into your API.

All of these aspects of the scan are entirely configurable within the stackhawk.yml file as well, by the way:

hawk:
 spider:
   maxDurationMinutes: 2 # maximum allowed time in minutes for spiders to crawl your application.
   base: true # the basic spider utility that looks at html source files and follows urls it finds. Enabled by default.
   ajax: false # a more complex spider operation that follows dynamic links and buttons in an application.

Discovering an Application with HAR Files

A HAR (HTTP Archive) file is a log of a web browser's interaction with a website. It stands for HTTP Archive format and is designed to store and share collected data about network requests, responses, and other performance-related information. HAR files capture details such as URLs, headers, cookies, timings, and content for each HTTP request made by the browser, which can be leveraged to discover your application.

With HawkScan, you can identify and map the paths of your web application using HAR files. Although we prefer API specifications, HAR files rovide a high level of control and precision in how HawkScan navigates and analyzes your web application making it a better alternative than spidering for scan discovery.

We’ve made the scan discovery process for single-page apps even easier, by allowing you to record HAR files directly from your local machine, providing even greater accuracy. This is extremely helpful for recording authentication to ensure you are testing password-protected routes.

To record a session, use hawk perch start --with-chrome and --with-proxy-info to begin the recording, and hawk perch stop --har-file=<file name> to save your session. You can learn more with hawk perch start --help and hawk perch stop --help.

Discovering an application by telling HawkScan what's what

The web-crawling mechanism to discover a web application is not a silver bullet. It requires a link to be on every page in your web application in some fashion, all starting from your root `app.host`. This scenario won’t work for pages that are unlinked or hidden. If you know exactly what paths in your web application you want HawkScan to visit,  tell it explicitly by specifying `.seedPaths` . Providing HawkScan with seedPaths will add these application routes to the internal site tree to be visited later during the Active Scan.

hawk:
 spider:
   seedPaths:
     - /hidden
     - /secret-path
     - /unlinked-endpoint-no-spider-will-ever-find

You can read more about HawkScan scan discovery and spidering mechanisms in our sweet documentation.

Customizing HawkScan with your favorite DevTools

This brings us to a cool new feature in HawkScan 2.8.0: Custom Scan Discovery. This feature allows HawkScan to be configured with a specified process command that will run in an environment designed for HawkScan to intercept the web traffic the command generates.

This feature is highly flexible to different environments or build systems, so that advanced developer resources can be reused.

By using this feature security teams can leverage the Postman Collections developers write for testing their API endpoints:

 spider:
   base: false
   custom:
     command:  "newman run postman_collection.json"

Or they can run their Cypress test suites and feed HawkScan the requests it makes into your web application:

hawk:
 spider:
   base: false
   custom:
     command:  "./node_modules/.bin/cypress run -s path/to/cypress-specs"
     environment:
       NO_PROXY: "<-loopback>"

The configuration support for Custom Discovery even works with HawkScan's smart ability to interpolate and safely handle secrets from configuration at runtime. It’s so flexible, you can even invoke a shell and call any arbitrary commands a researcher may need with access to more terminal resources, so security researchers can get into all kinds of shenanigans with HawkScan:

# security researchers can try this, but not recommended for the pipeline!
app:
  host: ${APP_HOST:http://localhost:9000}
 
hawk:
  spider:
    base: false
    custom:
      command: bash
	credentials: 
      arguments:
        - -c
        - "echo KAAKAWW!! && curl -x $HTTP_PROXY -X DELETE 
  ${APP_HOST:http://localhost:9000}/admin/records/indices"

And more! These tools are just the tip of the iceberg. Any devtools that support proxying their web traffic into a separate host, either by the `HTTP_PROXY` environment variable or configuration file, can be used for customized Scan Discovery.

You can learn more about how to succeed with Custom Scan Discovery in our documentation.

Combine them all and discover your whole application

Part of the power of HawkScan is it can use all or none of these spidering, seedpath, and custom discovery mechanisms together. And HawkScan works even better when configured to scan specific API protocols, such as OpenAPI, GraphQL, or Soap. HawkScan has the flexibility and capabilities to adapt to any software environment and support engineering teams in finding vulnerabilities anywhere in a running web application. By giving smarter, straightforward resources to developers and software teams, we hope users can maintain a stronger application security posture as they develop and defend their awesome software.

📺 Watch a Quick Demo


Sam Volin  |  March 4, 2024

Read More

Top 4 Ways StackHawk Customers Shift Left

Top 4 WaysStackHawk CustomersShift Left

Optimizing Security Scans for Speed and Accuracy

Optimizing SecurityScans for Speedand Accuracy

Accelerating Security with StackHawk: Reducing Distance, Maximizing Speed

Accelerating Security with StackHawk:Reducing Distance, Maximizing Speed