Files

Daniel Miessler 73f46e0efa docs: update research to casual tone and add AI caveat

- Changed formal academic language to more casual/humble tone
- Added important caveat about AI-executed research to all documents
- Made section headings more conversational
- Clarified this is an experiment in AI-assisted research, not equivalent to human research

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-24 22:14:06 -08:00

14 KiB

Raw Permalink Blame History

Research Methodology: Net Effects of Offensive Security Tooling

Date: November 24, 2025 By: Daniel Miessler (with Kai)

Important caveat: This research was executed entirely by AI systems (Claude, Gemini, Perplexity/OpenAI) with scaffolding designed to emulate research rigor. The data was gathered by AI agents and analyzed by AI agents. While we tried to be thorough and cite real sources, this should NOT be considered equivalent to research conducted by a human research team. It's an experiment in AI-assisted research, and the findings are open for debate and discussion. Take it as a starting point, not a definitive answer.

How We Did This

We threw a bunch of AI agents at this problem from different angles, then red-teamed both sides of the argument:

Phase 1: Gather data using parallel research agents (Claude, Perplexity, Gemini)
Phase 2: Break down each argument into 24 atomic claims
Phase 3: Have 32 specialized agents attack each argument
Phase 4: Figure out where agents converged (what held up)
Phase 5: Build the strongest version of each argument, then attack it

Phase 1: Gathering the Data

The AI Platforms We Used

Claude (Anthropic): Deep technical analysis, attacker knowledge research
Perplexity: Real-time web research, academic studies, industry data
Gemini (Google): Ecosystem analysis, defender benefit quantification

What Each Agent Looked For

Agent 1: perplexity-researcher Topic: Does disclosure actually make vendors patch faster? Focus: Academic papers on patch rates, disclosure timing studies, vendor behavior, CERT/CC data, time-to-exploit vs time-to-patch

Agent 2: claude-researcher Topic: Do sophisticated attackers already have these tools? Focus: Zero-day lifespan studies, collision rates, zero-day market prices, attacks-in-the-wild before disclosure, how long it takes attackers to develop tools

Agent 3: gemini-researcher Topic: How much do defenders actually benefit? Focus: Penetration testing industry data, bug bounty ROI, red team exercise outcomes, breach cost comparisons, detection improvements

Phase 2: Breaking Down the Arguments

The Approach

We broke each argument (Net Negative and Net Positive) into exactly 24 atomic claims—specific statements that could be individually challenged.

What made a good atomic claim:

Self-contained (understandable on its own)
Specific (not vague or general)
Attackable (someone could reasonably push back on it)

Net Negative Argument: 24 Claims

Publishing exploit code makes it trivially easy for unskilled attackers to compromise systems
Script kiddies who couldn't develop exploits independently now have weaponized tools
The time-to-exploit has collapsed from 32 days to 5 days, largely due to tool availability
Metasploit modules are used in 10.5% of malware C2 servers, proving criminal adoption
Defenders already have vendor patches; they don't need exploit code to protect systems
The information asymmetry structurally favors attackers who need only one vulnerability
Publishing exploits expands the total attack surface by enabling more attackers
Sophisticated attackers already have private capabilities; public tools only help amateurs
Bug bounty and pen testing could function with private, licensed tools instead
The "defenders need it" argument assumes defenders are more likely to use tools than attackers
Attacks increase by up to 5 orders of magnitude after public disclosure
The zero-day market proves attackers pay millions for exclusive access—public tools destroy that exclusivity for free
Coordinated disclosure works; full disclosure with exploit code is unnecessary
China's 48-hour disclosure law shows governments weaponize vulnerability information
The window for defensive action is now too short—30% exploitation within 24 hours
Most organizations lack resources to act on vulnerability information regardless
Restricting tools would create friction for attackers without eliminating their capabilities
The "attacker knowledge asymmetry" claim lacks empirical measurement
Medical and other regulated fields restrict dangerous knowledge; security should too
The original Metasploit rationale assumed a pre-cloud, pre-automation threat landscape
Open source offensive tools enable adversarial nation-states without procurement costs
The security industry financially benefits from attack tools existing; conflict of interest
Enterprise defenders use commercial tools anyway; open source benefits attackers more
Every public exploit is a free force multiplier for criminal organizations

Net Positive Argument: 24 Claims

Sophisticated attackers already possess offensive capabilities independent of public tools
Zero-day lifespan of 6.9 years proves attackers have years of advance knowledge
Only 5% of vulnerabilities with public exploits are actually exploited in the wild
Vulnerability disclosure accelerates vendor patching by 137% (Arora et al. 2008)
Organizations using offensive testing have $1.76M lower breach costs (IBM/Ponemon)
81% of organizations now use penetration testing, creating massive defender capability
Bug bounty programs achieve 544% ROI and find 40% more vulnerabilities than traditional testing
Red team exercises improve detection rates by 3-4x (Mandiant data)
The 5.7% annual collision rate means restricting tools doesn't prevent attacker discovery
80% of exploits appear BEFORE their CVE—attackers don't wait for public disclosure
Restricting tools primarily harms defenders who need to test their own systems
The zero-day market ($5-20M for iOS) proves sophisticated attackers have alternative supply chains
Penetration testing training produces better incident responders and threat hunters
MITRE ATT&CK coverage improves from 16-20% to near 100% after red team exercises
Script kiddies using public tools are easier to detect than sophisticated attackers using private tools
Medical analogy fails: doctors share disease knowledge; security obscuring attacks doesn't prevent them
Defenders must protect all attack surfaces; knowing the attacks enables prioritization
95% patch rate before public disclosure via bug bounties proves coordinated disclosure works
Open tools enable security research that benefits the entire ecosystem
Countries/organizations that restrict security knowledge have worse security outcomes
The "friction for attackers" argument ignores that friction doesn't stop motivated adversaries
Offensive training develops "adversarial thinking" that correlates with better defensive outcomes
Enterprise commercial tools exist BECAUSE open source proved the concept; ecosystem benefit
Every SOC analyst needs to understand offensive techniques to detect and investigate attacks

Phase 3: Red-Teaming Both Sides

How We Ran the Analysis

32 agents per argument, all launched in parallel.

Each agent got:

The full original argument
The 24-claim breakdown
A specific personality and attack angle
Instructions to find BOTH strengths AND weaknesses

Agent Roster: 8 Principal Engineers

Technical and logical folks:

Agent	Personality	Perspective
PE-1	Skeptical Systems Thinker	"Where does this break at scale?"
PE-2	Evidence Demander	"Show me the numbers that prove this."
PE-3	Edge Case Hunter	"What happens when X is not true?"
PE-4	Historical Pattern Matcher	"We tried this in 2008 and here's what happened."
PE-5	Complexity Realist	"This is harder than it sounds because..."
PE-6	Dependency Tracer	"This assumes X, which assumes Y, which is false."
PE-7	Failure Mode Analyst	"Here are 5 ways this fails catastrophically."
PE-8	Technical Debt Accountant	"The real price of this approach is..."

Agent Roster: 8 Architects

Big-picture thinkers:

Agent	Personality	Perspective
AR-1	Big Picture Thinker	"This ignores how it fits into the larger system."
AR-2	Trade-off Illuminator	"You gain X but lose Y, and Y matters more."
AR-3	Abstraction Questioner	"These aren't the same category of problem."
AR-4	Incentive Mapper	"Who benefits from this being true?"
AR-5	Second-Order Effects Tracker	"This causes A, which causes B, which destroys C."
AR-6	Integration Pessimist	"This doesn't compose with existing reality."
AR-7	Scalability Skeptic	"This works for 10, not 10,000."
AR-8	Reversibility Analyst	"Once you do this, you can't go back."

Agent Roster: 8 Pentesters

Adversarial thinkers:

Agent	Personality	Perspective
PT-1	Red Team Lead	"Here's how I'd exploit this logic."
PT-2	Assumption Breaker	"This depends on X, and X is false."
PT-3	Game Theorist	"A smart opponent would simply..."
PT-4	Social Engineer	"People will route around this because..."
PT-5	Precedent Finder	"This is just [past example] in a new dress."
PT-6	Defense Evaluator	"This defense fails because attackers can..."
PT-7	Threat Modeler	"You've left this entire surface undefended."
PT-8	Asymmetry Spotter	"Attackers have unlimited time; defenders don't."

Agent Roster: 8 Interns

Fresh eyes and contrarians:

Agent	Personality	Perspective
IN-1	Naive Questioner	"But why do we assume X in the first place?"
IN-2	Analogy Finder	"This is just like [other field] where it failed."
IN-3	Contrarian	"What if the exact opposite is true?"
IN-4	Common Sense Checker	"This violates basic intuition because..."
IN-5	Zeitgeist Reader	"In practice, nobody actually does this because..."
IN-6	Simplicity Advocate	"The simpler explanation is..."
IN-7	Edge Lord	"If this is true, then [absurd consequence] must also be true."
IN-8	Devil's Intern	"The uncomfortable truth nobody wants to say is..."

What Each Agent Had to Return

Each agent gave us:

**[AGENT ID] ANALYSIS:**

**Strongest Point FOR the Argument:** [Claim #X]
[2-3 sentences on why this is valid]
Take seriously because: [1 sentence]

**Strongest Point AGAINST the Argument:** [Claim #Y]
[2-3 sentences on the flaw]
Problematic because: [1 sentence]

**Overall Assessment:** [One sentence verdict]

Phase 4: Finding What Held Up

How We Identified Convergence

Strong Convergence (5+ agents):

Marked as CRITICAL finding
Weighted heavily in final analysis

Moderate Convergence (3-4 agents):

Marked as SIGNIFICANT finding
Weighted secondarily

Unique Insights (1-2 agents):

Marked as NOTABLE
Kept for completeness

How We Categorized Findings

Strengths:

Valid Evidence
Sound Logic
Real Problem Identified
Historical Support

Weaknesses:

Logical Fallacies
Missing Evidence
Hidden Assumptions
Counterexamples
Precedent Contradictions
Second-Order Effects

Phase 5: Steelmanning Then Attacking

Building the Strongest Version

For each argument, we built the strongest possible version before attacking it.

Format: 8 points, 12-16 words each

Why: To make sure we weren't attacking a straw man

Then Attacking the Strong Version

We applied first-principles analysis:

Identify what type of claim it is (causal, comparative, categorical, predictive, normative)
Surface hidden assumptions
Check historical precedent
Test logical validity
Make sure the counter defeats the STEELMAN, not a weaker version