Prancer Blog / SwarmHack Deep Dive

Inside SwarmHack: Anatomy of a Single-Command Kill Chain

How one swarmhack spawn command orchestrates 32 AI agents through reconnaissance, exploitation and lateral movement against a segmented Docker lab.

SwarmHack Team · 2026-04-15 · 8 min

TL;DR

How a single swarmhack spawn command runs a full external + internal pentest
The lab topology that proves segmentation is no defense against autonomous agents
The numbers: 11 findings · 35 crown jewels · 6m 7s — reproducible across 11 consecutive runs

SwarmHack is a penetration testing tool, not a vulnerability scanner. The distinction matters: a scanner enumerates every weakness and produces a CVSS-sorted report. A pentest finds one way in, exploits it deep, pivots, escalates, reaches the crown jewels — then stops.

This post is the first of a three-part series walking through a real validated engagement against a multi-host Docker lab.

1. One Command, Full Kill Chain

swarmhack spawn --target http://localhost:8880 \
    --token $PRANCER_TOKEN --customer $PRANCER_CUSTOMER

That command — and only that command — produces:

| Metric | Value |

| -------- | ------- |

| Total findings | 11 |

| Crown jewels extracted | 35 |

| Targets compromised | 2 (external + internal via pivot) |

| Wall-clock time | 6 minutes 7 seconds |

| Human intervention | Zero |

| Reproducibility | 11/11 consecutive runs identical |

Each agent stops on first confirmed vulnerability, then shifts into deep exploitation — extracting data, harvesting credentials, mapping crown jewels — instead of scanning every endpoint for the same class. Fewer findings, higher confidence, deeper impact.

2. Lab Architecture

A purpose-built Docker Compose lab with enforced network segmentation: one bridge network for internet-facing traffic, and a second one marked internal: true so Docker drops all outbound packets.

         ATTACKER
            │
   ┌────────┴────────┐ external_net (172.20.0.0/24)
   │                 │
┌──┴───────────────┐ │
│  TARGET A        │ │  ← dual-homed pivot
│  172.20.0.10     │ │     Apache + PHP + MySQL + SSH
│  172.20.1.10 ────┼─┼──┐
└──────────────────┘ │  │
                     │  │ internal_net (172.20.1.0/24)
                     │  │ Docker drops all outbound
                     │  │
                  ┌──┴──┴────────────┐
                  │  TARGET B        │  ← unreachable from outside
                  │  172.20.1.20     │     DVWA, no port mapping
                  └──────────────────┘

</diagram>

external_net: standard bridge, reachable from the host via ports 8880 (HTTP), 2222 (SSH), 33060 (MySQL).
internal_net (internal: true): Docker blocks every packet that tries to leave. The only path in is *through* Target A.

That's the segmentation a real network gives you. The whole point of this engagement is showing how an autonomous tool routes around it.

3. The Targets

3.1 Target A — the dual-homed front door

Ubuntu image with Apache 2.4.41, PHP, MySQL (root, no password), and OpenSSH. Vulnerable endpoints span CMDI, XSS, CSRF, SQLi, XXE, session fixation, and exposed .env / .git/config.

| Endpoint | Vulnerability | Parameter |

| ---------- | -------------- | ----------- |

| /ping.php | OS Command Injection (CWE-78) | host (POST) |

| /search.php | Reflected XSS (CWE-79) | q (POST) |

| /login.php | Session Fixation (CWE-384) | username, password |

| /admin.php | XXE, SQLi, CSRF | cmd (POST), XML body |

| /.env | Credential disclosure | Stripe key, SSH creds, DB creds |

The .env file is the seed of the entire kill chain:

DB_HOST=localhost
SECRET_KEY=sk_live_4eC39HqLyjWDarjtT1zdp7dc
STRIPE_API_KEY=sk_test_51ABCDeFgHiJkLmNoPqRsTuVwXyZ
INTERNAL_API=http://172.20.1.20/api/v1
SSH_USER=pentest
SSH_PASS=pentest123

Two SSH accounts exist: root:toor and pentest:pentest123. The pentest user has sudo NOPASSWD: ALL — a trivial privesc once shell access is obtained. The lab mirrors patterns we see in real production environments.

3.2 Target B — the internal-only DVWA

vulnerables/web-dvwa:latest on internal_net only. No port mapping, no route out. The only way to reach it is to pivot through Target A.

4. Phase 1: External Web Scan (T+0s → T+30s)

The GOAP planner runs WebCrawler first (sequential, 90s timeout, max 100 pages), then launches 20+ exploit agents in parallel via tokio::JoinSet with a 25-slot semaphore. A GlobalRateLimiter caps execution at 50 concurrent requests / 50 req/s — enough to stay deterministic, slow enough to avoid tripping a WAF.

SwarmHack → :8880
  │
  ├─ WebCrawler (sequential, 90s timeout)
  │    Maps: /, /ping.php, /search.php, /login.php, /admin.php, /info.php
  │
  └─ 20+ agents in parallel (tokio::JoinSet, 25-slot semaphore)
       ├─ CMDI    → ping.php 'host'    → marker SWMHK12019CK confirmed
       ├─ XXE     → admin.php          → file:///etc/hostname resolved
       ├─ CSRF    → admin.php          → cross-origin POST accepted
       ├─ SQLi    → admin.php 'cmd'    → tautology accepted
       ├─ XSS     → search.php 'q'     → <script>alert(1)</script> reflected
       ├─ VulnComp→ Apache/2.4.41      → CVE-2021-44790, CVE-2021-44224
       ├─ AuthByp → admin.php          → accessible without auth
       └─ SessFix → login.php          → PHPSESSID not regenerated

</diagram>

5. The Star Finding: Command Injection in `ping.php`

| Field | Value |

| ------- | ------- |

| Severity | critical |

| CWE | CWE-78 (OS Command Injection) |

| MITRE ATT&CK | T1059, T1190, T1059.004 |

| Payload | ; echo SWMHK$(expr 7777 + 4242)CK |

| Detection | Marker SWMHK12019CK found in response body |

| Crown jewels | 15 |

The arithmetic-marker payload (expr 7777 + 4242 = 12019) is the gold standard for CMDI confirmation: if the computed value appears in the response, the server executed the injected shell command. Zero false-positive surface.

Deep exploitation extracted 15 crown jewels in a single pass — credentials, kernel info, network interfaces, the .env file (which seeds Phase 2), and six ready-to-use exfiltration payloads:

| # | Category | Value |

| --- | ---------- | ------- |

| 1–3 | credential | www-data, uid=33, 26 accounts (root + pentest with shells) |

| 4 | api_key | .env contents — Stripe + SSH + DB credentials |

| 5–9 | system_info | Kernel, env vars, 3 interfaces (eth0 / eth1 / lo), FS layout |

| 10–15 | post_exploitation | Bash + nc reverse shells, curl/HTTP exfil, DNS exfil |

Kill-chain significance — the .env extraction is the bridge. SSH_USER=pentest, SSH_PASS=pentest123, and eth1: 172.20.1.10 are all the autonomous engine needs to pivot from web exploit to internal compromise. That's the topic of Part 2.

6. Confidence is Calibrated, not Guessed

Every finding ships with a confidence floor derived from evidence quality — not from a language model:

| Agent class | Confidence | Basis |

| ------------- | ----------- | ------- |

| Privilege Escalation | 1.00 | Chain fully validated |

| CMDI | 0.99 | Marker arithmetic proof |

| XSS | 0.90 | Payload reflected unencoded |

| SQLi | 0.75 | Tautology heuristic |

| CSRF | 0.70 | Cross-origin POST accepted |

| XXE | 0.60 | In-band heuristic only |

A reviewer can replay any finding: same payload, same detection logic, same response — same verdict. Across 11 consecutive scans, SwarmHack produced exactly 11 findings and 35 crown jewels every single time.

What's Next

In Part 2 we follow the credentials extracted here through the SSH lateral-movement loop, the privilege-escalation synthesizer, and the SSH tunnel that puts the autonomous engine inside the isolated network — all without a human touching the keyboard.

In Part 3 we explain why this whole architecture is intentionally not an LLM, and what that means for cost, compliance, and reproducibility.