Spaces:

Nitishkumar-ai
/

commitguard-env

Sleeping

App Files Files Community

commitguard-env / docs /testprojects.md

Nitishkumar-ai

Deployment Build (Final): Professional Structure + Blog

95cbc5b 4 days ago

preview code

raw

history blame contribute delete

5.15 kB

🧪 CommitGuard — Test Projects & Penetration Testing Targets

This document serves as a catalog of vulnerable applications and datasets used to benchmark, train, and penetration-test the CommitGuard agent. These projects provide the ground-truth "exploit targets" required to verify the agent's detection accuracy across various CWE categories.

Tier 1 — Purpose-Built Vulnerable Python Apps

Best for: Controlled training, unit testing, and reward model validation.

These projects are intentionally designed with security loopholes, making them ideal for verifying that CommitGuard’s reward model correctly identifies specific CWEs.

1. Checkmarx c{api}tal

GitHub: Checkmarx/capital
Tech Stack: FastAPI, Pydantic, Alembic, React.
Vulnerabilities: 10 challenges mapping to OWASP Top 10 API risks.
CWEs Covered: Broken Object Level Auth (BOLA), Mass Assignment, Broken Authentication, SSRF.
CommitGuard Fit: Matches the modern Python backend stack (FastAPI + Pydantic) that the agent is optimized to audit.

2. vulnpy by Contrast Security

GitHub: Contrast-Security-OSS/vulnpy
Tech Stack: FastAPI, Flask, Django support.
Vulnerabilities: Purposely-vulnerable functions that can be mounted as routes.
CWEs Covered: SQLi, Path Traversal, Command Injection, XSS, SSRF, Deserialization.
CommitGuard Fit: Isolated, clean diff-level code units—perfect for the granularity of the RL environment.

3. OWASP BenchmarkPython

GitHub: OWASP-Benchmark/BenchmarkPython
Purpose: Verifying SAST/DAST/IAST accuracy.
CommitGuard Fit: Provides a standardized scorecard to measure CommitGuard's accuracy against established tools like Bandit or ZAP.

4. python-insecure-app by trottomv

GitHub: trottomv/python-insecure-app
CWEs Covered: CWE-798 (Hardcoded Credentials), CWE-94 (SSTI/Code Injection), CWE-937 (Vulnerable Dependencies).
CommitGuard Fit: Demonstrates "shift-left" security by targeting insecure dependencies and secrets in FastAPI.

5. Intentionally-Vulnerable-Python-Application

GitHub: mukxl/Intentionally-Vulnerable-Python-Application
Purpose: Designed for SCA, SAST, and DAST analysis.
CommitGuard Fit: Excellent regression test target to confirm the agent catches what conventional scanners catch—and identifies what they miss.

6. Vulnerable-API by michealkeines

GitHub: michealkeines/Vulnerable-API
Tech Stack: Flask, Jinja, SQLite3.
CWEs Covered: CWE-89 (SQLi), CWE-79 (XSS), CWE-73 (LFI/RFI), CWE-94 (SSTI).
CommitGuard Fit: Specifically designed to test automated API scanners with injection-heavy payloads.

Tier 2 — Real-World Projects with Known CVEs

Best for: Advanced agent training and "In-the-Wild" performance testing.

These production-grade projects provide non-synthetic, complex commit diffs derived from the National Vulnerability Database (NVD).

Project	Stack	CWE / Vulnerability Type	Relevance
Django	Django + ORM	SQL Injection (CWE-89), Open Redirect (CWE-601), ReDoS (CWE-400)	High-signal real-world diffs
Pillow	Python Imaging	Buffer overflows, Path Traversal, ACE	Tests non-web CWE detection
Requests	Python HTTP	SSRF, Header Injection	Header-level vuln detection
Paramiko	SSH/Crypto	Auth bypass (CVE-2018-7750), Weak crypto	Crypto CWE training data
PyYAML	Config Parsing	Deserialization ACE (CWE-502)	Classic commit-diff CWE

Recommended Dataset: CVEFixes (ZeoVan/CVEfixes) A multi-language dataset providing the exact fixing commits for vulnerabilities, annotated at function and file levels. Directly usable for CommitGuard's diff-based pipeline.

Tier 3 — Pydantic & Type-Safety Specific Targets

Best for: Specialized auditing of type-driven Python applications.

Pydantic v1 Model Misuse Patterns: Common vulnerabilities involving model.dict(), validator skipping, or internal field exposure. The Checkmarx c{api}tal project is the primary reference for this.
Type-Safety "Escape Hatches": Targets projects with complex type annotations, union types, and Any usage where type-safety bugs often hide.
- Targets: Django REST Framework and FastAPI projects with permissive Optional fields or loose Pydantic validation.

📈 Benchmarking Strategy

To verify CommitGuard's performance on these projects:

Extraction: Use scratch/extract_sample.py to pull vulnerable diffs from the projects above.
Evaluation: Run scripts/evaluate.py using these samples as the test set.
Comparison: Compare results against the eval_baseline.json to visualize the delta in detection capabilities.