File size: 5,148 Bytes
95cbc5b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# 🧪 CommitGuard — Test Projects & Penetration Testing Targets

This document serves as a catalog of vulnerable applications and datasets used to benchmark, train, and penetration-test the CommitGuard agent. These projects provide the ground-truth "exploit targets" required to verify the agent's detection accuracy across various CWE categories.

---

## Tier 1 — Purpose-Built Vulnerable Python Apps
*Best for: Controlled training, unit testing, and reward model validation.*

These projects are intentionally designed with security loopholes, making them ideal for verifying that CommitGuard’s reward model correctly identifies specific CWEs.

### 1. Checkmarx c{api}tal
- **GitHub:** [Checkmarx/capital](https://github.com/Checkmarx/capital)
- **Tech Stack:** FastAPI, Pydantic, Alembic, React.
- **Vulnerabilities:** 10 challenges mapping to OWASP Top 10 API risks.
- **CWEs Covered:** Broken Object Level Auth (BOLA), Mass Assignment, Broken Authentication, SSRF.
- **CommitGuard Fit:** Matches the modern Python backend stack (FastAPI + Pydantic) that the agent is optimized to audit.

### 2. vulnpy by Contrast Security
- **GitHub:** [Contrast-Security-OSS/vulnpy](https://github.com/Contrast-Security-OSS/vulnpy)
- **Tech Stack:** FastAPI, Flask, Django support.
- **Vulnerabilities:** Purposely-vulnerable functions that can be mounted as routes.
- **CWEs Covered:** SQLi, Path Traversal, Command Injection, XSS, SSRF, Deserialization.
- **CommitGuard Fit:** Isolated, clean diff-level code units—perfect for the granularity of the RL environment.

### 3. OWASP BenchmarkPython
- **GitHub:** [OWASP-Benchmark/BenchmarkPython](https://github.com/OWASP-Benchmark/BenchmarkPython)
- **Purpose:** Verifying SAST/DAST/IAST accuracy.
- **CommitGuard Fit:** Provides a standardized scorecard to measure CommitGuard's accuracy against established tools like Bandit or ZAP.

### 4. python-insecure-app by trottomv
- **GitHub:** [trottomv/python-insecure-app](https://github.com/trottomv/python-insecure-app)
- **CWEs Covered:** CWE-798 (Hardcoded Credentials), CWE-94 (SSTI/Code Injection), CWE-937 (Vulnerable Dependencies).
- **CommitGuard Fit:** Demonstrates "shift-left" security by targeting insecure dependencies and secrets in FastAPI.

### 5. Intentionally-Vulnerable-Python-Application
- **GitHub:** [mukxl/Intentionally-Vulnerable-Python-Application](https://github.com/mukxl/Intentionally-Vulnerable-Python-Application)
- **Purpose:** Designed for SCA, SAST, and DAST analysis.
- **CommitGuard Fit:** Excellent regression test target to confirm the agent catches what conventional scanners catch—and identifies what they miss.

### 6. Vulnerable-API by michealkeines
- **GitHub:** [michealkeines/Vulnerable-API](https://github.com/michealkeines/Vulnerable-API)
- **Tech Stack:** Flask, Jinja, SQLite3.
- **CWEs Covered:** CWE-89 (SQLi), CWE-79 (XSS), CWE-73 (LFI/RFI), CWE-94 (SSTI).
- **CommitGuard Fit:** Specifically designed to test automated API scanners with injection-heavy payloads.

---

## Tier 2 — Real-World Projects with Known CVEs
*Best for: Advanced agent training and "In-the-Wild" performance testing.*

These production-grade projects provide non-synthetic, complex commit diffs derived from the National Vulnerability Database (NVD).

| Project | Stack | CWE / Vulnerability Type | Relevance |
| :--- | :--- | :--- | :--- |
| **Django** | Django + ORM | SQL Injection (CWE-89), Open Redirect (CWE-601), ReDoS (CWE-400) | High-signal real-world diffs |
| **Pillow** | Python Imaging | Buffer overflows, Path Traversal, ACE | Tests non-web CWE detection |
| **Requests** | Python HTTP | SSRF, Header Injection | Header-level vuln detection |
| **Paramiko** | SSH/Crypto | Auth bypass (CVE-2018-7750), Weak crypto | Crypto CWE training data |
| **PyYAML** | Config Parsing | Deserialization ACE (CWE-502) | Classic commit-diff CWE |

**Recommended Dataset:** [CVEFixes (ZeoVan/CVEfixes)](https://github.com/ZeoVan/CVEfixes)
A multi-language dataset providing the exact fixing commits for vulnerabilities, annotated at function and file levels. Directly usable for CommitGuard's diff-based pipeline.

---

## Tier 3 — Pydantic & Type-Safety Specific Targets
*Best for: Specialized auditing of type-driven Python applications.*

1. **Pydantic v1 Model Misuse Patterns:** Common vulnerabilities involving `model.dict()`, validator skipping, or internal field exposure. The **Checkmarx c{api}tal** project is the primary reference for this.
2. **Type-Safety "Escape Hatches":** Targets projects with complex type annotations, union types, and `Any` usage where type-safety bugs often hide. 
   - **Targets:** Django REST Framework and FastAPI projects with permissive `Optional` fields or loose Pydantic validation.

---

## 📈 Benchmarking Strategy
To verify CommitGuard's performance on these projects:
1. **Extraction:** Use `scratch/extract_sample.py` to pull vulnerable diffs from the projects above.
2. **Evaluation:** Run `scripts/evaluate.py` using these samples as the test set.
3. **Comparison:** Compare results against the `eval_baseline.json` to visualize the delta in detection capabilities.