Autonomous Offensive Security

One Command. Full Pentest.

PentestPilot is an automated web application security testing platform built for professional penetration testers and bug bounty hunters. Point it at a target and it runs a full engagement autonomously — from authentication and reconnaissance through exploitation, AI-powered validation, and final reporting.

26 integrated security tools. 11-phase pipeline. Multi-agent AI validation that debates every finding before it reaches your report. No interaction required between start and finish.

Every finding ships with reproduction steps, evidence bundles, and OWASP WSTG coverage mapping. The system argues with itself so you don't have to argue with your client.

$ pentestpilot scan https://target.com --full
[00:00] Scan started — target: target.com
[00:14] AuthLadder: 3 sessions captured (admin, user, readonly)
[01:22] Recon: Katana + GAU + JS analysis → 247 endpoints
[02:15] Discovery: 89 injection vectors across 17 types
[03:41] SQLi confirmed: /api/users?id=1 (error-based, schema dumped)
[04:12] SSTI confirmed: /render?template= (Jinja2, RCE possible)
[06:30] Validation: 5-stage pipeline — 12/14 findings confirmed
[08:15] OWASP sweep: 109 WSTG test IDs evaluated, 3 gaps filled
[12:44] 29 specialists deployed, consensus reached on 8 criticals
[16:20] Expansion: admin session → +47 endpoints, 3 additional vulns
[18:42] Report complete — 15 findings, evidence bundles attached
[18:42] Scan finished in 18m 42s
26
Security Tools
11
Pipeline Phases
109
WSTG Test IDs
5
Validation Stages

Real Output From a Real Scan

These are actual screenshots from a completed scan. Not mockups.

PentestPilot scan overview dashboard showing 31 validated vulnerability findings, 84 AI validation runs, and severity breakdown for automated web application penetration test

Scan Overview

31 validated findings, 84 AI validation runs, severity breakdown with priority findings. One-click access to findings, endpoints, AI traces, and WSTG coverage.

PentestPilot vulnerability findings list with AI verified badges, severity ratings, and HTTP method indicators

Findings List

Every finding tagged with severity, AI VERIFIED status, and HTTP method. Filter by severity or validation state.

PentestPilot AI validation trace showing 84 automated security validations with 44 confirmed true positives and 40 rejected false positives

AI Validation Trace

84 total validations. 44 confirmed, 40 rejected. Every decision logged with confidence score and reasoning chain.

OWASP Web Security Testing Guide coverage matrix showing 109 WSTG test IDs tracked across 12 security categories

WSTG Coverage Matrix

Every WSTG test ID tracked across 12 categories. Green = tested. Coverage percentage per category. Gap analysis built in.

PentestPilot attack surface graph showing exploit chain visualization with critical attack paths and remediation priorities

Attack Surface Graph

Visual exploit chain mapping. Critical attack paths, potential exposure scoring, remediation priority. Interactive full-screen view.

PentestPilot agent tool registry showing 28 security testing tools across 6 categories configurable per scan

Agent Tool Registry

28 tools across 6 categories. Each specialist agent gets tools from its assigned categories. Toggle tools on/off per scan.

11-Phase Autonomous Pipeline

From credential exhaustion to recursive post-compromise expansion. No human interaction between start and report.

1

AuthLadder

8-step credential exhaustion: seeded creds, credential store, LLM login agent (Playwright), OSINT harvest, default creds, self-registration, full spray. Post-login verification on every attempt.

2

Reconnaissance

Katana SPA crawl, GAU historical URLs, Gobuster brute-force, EyeWitness screenshots, tech fingerprinting, JS AST parsing with framework-specific route extraction.

3

URL Collection

Canonicalized URL mapping. Merges crawled, JS-extracted, SPA, and Wayback endpoints. Arjun hidden parameter discovery. Normalized deduplication.

4

Injection Discovery

17+ injection vector types catalogued. Risk metadata per parameter. Reflected parameter detection with differential response analysis.

5

Classification

LLM + heuristic risk scoring, WSTG category mapping. Authenticated re-crawl discovers protected endpoints and feeds them back through discovery.

6

Active Testing

SQLMap, Dalfox, Nuclei, SSTImap, LFImap, SSRFmap, XXEinjector, ZAP, Playwright DOM XSS, BOLA replay, deserialization. Failed-tool fallback pass.

7

AI Validation

5-stage pipeline: protected findings gate, definitive evidence signals, LangGraph ReAct replay, adversarial review, confidence blending.

8

OWASP Sweep

12 WSTG categories, 109 test IDs. Dedicated agent per category. Coverage gap analysis identifies untested controls and schedules targeted tests.

9

AI Orchestrator

29 specialist agents deployed. Multi-round debates, cross-specialist challenges, adversarial refutation, pentest judge loop. PBFT consensus on high-severity findings.

10

Expansion

Recursive post-compromise loop. Each captured session triggers authenticated recon, delta computation, and a fresh testing cycle. Up to 3 iterations.

11

Reporting

Executive summary, reproduction steps, evidence bundles (curl scripts, payloads, screenshots), WSTG coverage matrix, severity-ranked aggregation.

Under the Hood

Not an LLM wrapper. A distributed AI system that debates, validates, and learns across scans.

Multi-Agent Consensus

Multiple specialist agents with different biases independently evaluate each finding. No single agent can confirm or reject a vulnerability unilaterally.

  • PBFT-inspired 4-phase protocol with 2f+1 quorum
  • Weighted voting based on agent track record accuracy
  • Adversarial agent actively tries to disprove every finding
  • Devil's advocate agent proposes alternative explanations

Attack Chain Reasoning

Causal graph models how vulnerabilities chain together, with strength scores and conditions. Edge types: ENABLES, AMPLIFIES, WEAKENS_DEFENSE.

  • XSS + no HttpOnly → session hijacking (0.9 strength)
  • SQLi + auth query → authentication bypass (0.95 strength)
  • Counterfactual analysis: "If CSP is strict, XSS chain breaks"
  • Impact propagation across full exploit graph

Knowledge Graph RAG

NetworkX graph (endpoints, parameters, vulnerabilities, sessions) fused with vector embeddings. Retrieval weights adapt per query type via Reciprocal Rank Fusion.

  • SITEMAP queries: graph-only (structural traversal)
  • PAYLOAD queries: vector-heavy (semantic similarity)
  • VULNERABILITY queries: 50/50 hybrid fusion
  • Graph edges: HAS_PARAM, VULNERABLE_TO, CHAINS_TO

Cross-Scan Learning

Confirmed findings become learning events. Successful payloads seed future tests. Debate outcomes feed back into the RAG system. The platform gets better with each scan.

  • Hypothesis store: verified + refuted patterns persist
  • Payload mutation: working payloads seed future fuzzing
  • Cross-domain: "jQuery XSS on site A, test on site B"
  • Stall detection: MD5 fingerprinting catches reasoning loops

How Findings Are Validated

Every finding passes through a multi-stage gauntlet. Once confirmed, it can never be downgraded by later analysis.

Protected Findings Gate WSTG-backed rules block invalid AI rejections at the boundary
Definitive Evidence Signals Command output, OAST callbacks, status code differentials bypass debate
ReAct Agent Replay LangGraph agent with tool access re-executes the finding independently
Direct Binary Verdict Separate model instance delivers a clean true/false with justification
Adversarial Refutation Red team agent proposes alternative explanations for every positive
Multi-Agent Consensus High-severity findings require 2f+1 quorum from biased specialist agents
Monotonic Confirmation Once ai_verified = true, no subsequent analysis can downgrade it

26 Tools, Orchestrated

Go binaries compiled from source (Go 1.25, CGO-enabled). Python tools git-cloned at build. Every tool has a typed integration wrapper with structured output parsing.

Go Binaries (compiled from source)
Dalfox
Nuclei
Interactsh
ffuf
GAU
dnsx
Naabu
Katana
Webanalyze
Gobuster
Waybackurls
GoWitness
Python / Ruby Tools (git-cloned at build)
SQLMap
SSTImap
LFImap
SSRFmap
XXEinjector
Liffy
EyeWitness
Framework Integrations
OWASP ZAP
Playwright
Retire.js
Arjun
Nmap
Exploitation
John the Ripper
ysoserial
phpggc

Ready to Try It?

Point it at a target. Get validated findings with reproduction steps, evidence bundles, and full WSTG coverage mapping.

Launch Mission Control

Want a demo, integration support, or just want to talk security?

admin@pentestpilot.ai