PentestPilot is an automated web application security testing platform built for professional penetration testers and bug bounty hunters. Point it at a target and it runs a full engagement autonomously — from authentication and reconnaissance through exploitation, AI-powered validation, and final reporting.
26 integrated security tools. 11-phase pipeline. Multi-agent AI validation that debates every finding before it reaches your report. No interaction required between start and finish.
Every finding ships with reproduction steps, evidence bundles, and OWASP WSTG coverage mapping. The system argues with itself so you don't have to argue with your client.
These are actual screenshots from a completed scan. Not mockups.
31 validated findings, 84 AI validation runs, severity breakdown with priority findings. One-click access to findings, endpoints, AI traces, and WSTG coverage.
Every finding tagged with severity, AI VERIFIED status, and HTTP method. Filter by severity or validation state.
84 total validations. 44 confirmed, 40 rejected. Every decision logged with confidence score and reasoning chain.
Every WSTG test ID tracked across 12 categories. Green = tested. Coverage percentage per category. Gap analysis built in.
Visual exploit chain mapping. Critical attack paths, potential exposure scoring, remediation priority. Interactive full-screen view.
28 tools across 6 categories. Each specialist agent gets tools from its assigned categories. Toggle tools on/off per scan.
From credential exhaustion to recursive post-compromise expansion. No human interaction between start and report.
8-step credential exhaustion: seeded creds, credential store, LLM login agent (Playwright), OSINT harvest, default creds, self-registration, full spray. Post-login verification on every attempt.
Katana SPA crawl, GAU historical URLs, Gobuster brute-force, EyeWitness screenshots, tech fingerprinting, JS AST parsing with framework-specific route extraction.
Canonicalized URL mapping. Merges crawled, JS-extracted, SPA, and Wayback endpoints. Arjun hidden parameter discovery. Normalized deduplication.
17+ injection vector types catalogued. Risk metadata per parameter. Reflected parameter detection with differential response analysis.
LLM + heuristic risk scoring, WSTG category mapping. Authenticated re-crawl discovers protected endpoints and feeds them back through discovery.
SQLMap, Dalfox, Nuclei, SSTImap, LFImap, SSRFmap, XXEinjector, ZAP, Playwright DOM XSS, BOLA replay, deserialization. Failed-tool fallback pass.
5-stage pipeline: protected findings gate, definitive evidence signals, LangGraph ReAct replay, adversarial review, confidence blending.
12 WSTG categories, 109 test IDs. Dedicated agent per category. Coverage gap analysis identifies untested controls and schedules targeted tests.
29 specialist agents deployed. Multi-round debates, cross-specialist challenges, adversarial refutation, pentest judge loop. PBFT consensus on high-severity findings.
Recursive post-compromise loop. Each captured session triggers authenticated recon, delta computation, and a fresh testing cycle. Up to 3 iterations.
Executive summary, reproduction steps, evidence bundles (curl scripts, payloads, screenshots), WSTG coverage matrix, severity-ranked aggregation.
Not an LLM wrapper. A distributed AI system that debates, validates, and learns across scans.
Multiple specialist agents with different biases independently evaluate each finding. No single agent can confirm or reject a vulnerability unilaterally.
Causal graph models how vulnerabilities chain together, with strength scores and conditions. Edge types: ENABLES, AMPLIFIES, WEAKENS_DEFENSE.
NetworkX graph (endpoints, parameters, vulnerabilities, sessions) fused with vector embeddings. Retrieval weights adapt per query type via Reciprocal Rank Fusion.
Confirmed findings become learning events. Successful payloads seed future tests. Debate outcomes feed back into the RAG system. The platform gets better with each scan.
Every finding passes through a multi-stage gauntlet. Once confirmed, it can never be downgraded by later analysis.
Go binaries compiled from source (Go 1.25, CGO-enabled). Python tools git-cloned at build. Every tool has a typed integration wrapper with structured output parsing.
Point it at a target. Get validated findings with reproduction steps, evidence bundles, and full WSTG coverage mapping.
Want a demo, integration support, or just want to talk security?
admin@pentestpilot.ai