Spaces:
Sleeping
🧪 CommitGuard — Test Projects & Penetration Testing Targets
This document serves as a catalog of vulnerable applications and datasets used to benchmark, train, and penetration-test the CommitGuard agent. These projects provide the ground-truth "exploit targets" required to verify the agent's detection accuracy across various CWE categories.
Tier 1 — Purpose-Built Vulnerable Python Apps
Best for: Controlled training, unit testing, and reward model validation.
These projects are intentionally designed with security loopholes, making them ideal for verifying that CommitGuard’s reward model correctly identifies specific CWEs.
1. Checkmarx c{api}tal
- GitHub: Checkmarx/capital
- Tech Stack: FastAPI, Pydantic, Alembic, React.
- Vulnerabilities: 10 challenges mapping to OWASP Top 10 API risks.
- CWEs Covered: Broken Object Level Auth (BOLA), Mass Assignment, Broken Authentication, SSRF.
- CommitGuard Fit: Matches the modern Python backend stack (FastAPI + Pydantic) that the agent is optimized to audit.
2. vulnpy by Contrast Security
- GitHub: Contrast-Security-OSS/vulnpy
- Tech Stack: FastAPI, Flask, Django support.
- Vulnerabilities: Purposely-vulnerable functions that can be mounted as routes.
- CWEs Covered: SQLi, Path Traversal, Command Injection, XSS, SSRF, Deserialization.
- CommitGuard Fit: Isolated, clean diff-level code units—perfect for the granularity of the RL environment.
3. OWASP BenchmarkPython
- GitHub: OWASP-Benchmark/BenchmarkPython
- Purpose: Verifying SAST/DAST/IAST accuracy.
- CommitGuard Fit: Provides a standardized scorecard to measure CommitGuard's accuracy against established tools like Bandit or ZAP.
4. python-insecure-app by trottomv
- GitHub: trottomv/python-insecure-app
- CWEs Covered: CWE-798 (Hardcoded Credentials), CWE-94 (SSTI/Code Injection), CWE-937 (Vulnerable Dependencies).
- CommitGuard Fit: Demonstrates "shift-left" security by targeting insecure dependencies and secrets in FastAPI.
5. Intentionally-Vulnerable-Python-Application
- GitHub: mukxl/Intentionally-Vulnerable-Python-Application
- Purpose: Designed for SCA, SAST, and DAST analysis.
- CommitGuard Fit: Excellent regression test target to confirm the agent catches what conventional scanners catch—and identifies what they miss.
6. Vulnerable-API by michealkeines
- GitHub: michealkeines/Vulnerable-API
- Tech Stack: Flask, Jinja, SQLite3.
- CWEs Covered: CWE-89 (SQLi), CWE-79 (XSS), CWE-73 (LFI/RFI), CWE-94 (SSTI).
- CommitGuard Fit: Specifically designed to test automated API scanners with injection-heavy payloads.
Tier 2 — Real-World Projects with Known CVEs
Best for: Advanced agent training and "In-the-Wild" performance testing.
These production-grade projects provide non-synthetic, complex commit diffs derived from the National Vulnerability Database (NVD).
| Project | Stack | CWE / Vulnerability Type | Relevance |
|---|---|---|---|
| Django | Django + ORM | SQL Injection (CWE-89), Open Redirect (CWE-601), ReDoS (CWE-400) | High-signal real-world diffs |
| Pillow | Python Imaging | Buffer overflows, Path Traversal, ACE | Tests non-web CWE detection |
| Requests | Python HTTP | SSRF, Header Injection | Header-level vuln detection |
| Paramiko | SSH/Crypto | Auth bypass (CVE-2018-7750), Weak crypto | Crypto CWE training data |
| PyYAML | Config Parsing | Deserialization ACE (CWE-502) | Classic commit-diff CWE |
Recommended Dataset: CVEFixes (ZeoVan/CVEfixes) A multi-language dataset providing the exact fixing commits for vulnerabilities, annotated at function and file levels. Directly usable for CommitGuard's diff-based pipeline.
Tier 3 — Pydantic & Type-Safety Specific Targets
Best for: Specialized auditing of type-driven Python applications.
- Pydantic v1 Model Misuse Patterns: Common vulnerabilities involving
model.dict(), validator skipping, or internal field exposure. The Checkmarx c{api}tal project is the primary reference for this. - Type-Safety "Escape Hatches": Targets projects with complex type annotations, union types, and
Anyusage where type-safety bugs often hide.- Targets: Django REST Framework and FastAPI projects with permissive
Optionalfields or loose Pydantic validation.
- Targets: Django REST Framework and FastAPI projects with permissive
📈 Benchmarking Strategy
To verify CommitGuard's performance on these projects:
- Extraction: Use
scratch/extract_sample.pyto pull vulnerable diffs from the projects above. - Evaluation: Run
scripts/evaluate.pyusing these samples as the test set. - Comparison: Compare results against the
eval_baseline.jsonto visualize the delta in detection capabilities.