commitguard-env / docs /usecase.md
Nitishkumar-ai's picture
Deployment Build (Final): Professional Structure + Blog
95cbc5b

CommitGuard — Use Cases & Test Scenarios

This document outlines the primary use cases and associated test scenarios for running CommitGuard as a standalone Command Line Interface (CLI) tool and as an integrated Plugin (e.g., CI/CD Pipeline or IDE Extension).

1. CommitGuard as a CLI (Standalone Workflow)

This use case is for security researchers, data scientists, and ML engineers training or evaluating the model locally or on a dedicated VM.

1.1 Data Preprocessing

  • Scenario: Convert raw Devign JSON into a filtered, balanced, 5000-sample JSONL file.
  • Action: Run python scripts/preprocess_devign.py --limit 5000
  • Expected Result: data/devign_filtered.jsonl is created with clean, XML-ready code diffs and valid cwe labels.

1.2 Environment Server (OpenEnv)

  • Scenario: Start the RLVR training environment.
  • Action: Run python -m commitguard_env.server
  • Expected Result: Server starts on port 8000. curl http://localhost:8000/health returns {"status": "healthy"}. tests/test_no_leak.py confirms no label leakage in /reset or /state.

1.3 Model Training (GRPO)

  • Scenario: Train the Llama-3.2-3B model using the live RLVR environment.
  • Action: Run python scripts/train_grpo.py --live --steps 500
  • Expected Result: Model trains using 4-bit quantization and LoRA. Training curve uploads to WandB. Checkpoints save every 50 steps.

1.4 Agentic Evaluation

  • Scenario: Evaluate the trained LoRA adapter on 100 held-out test samples.
  • Action: Run python scripts/evaluate.py --adapter_path ./outputs/commitguard-final
  • Expected Result: The agent executes a 5-step loop (request_context -> analyze -> verdict). A detailed eval_results.json report is generated showing accuracy per CWE.

1.5 Visualization

  • Scenario: Generate performance plots for reporting.
  • Action: Run python plots/plot_baseline_vs_trained.py
  • Expected Result: A PNG bar chart is saved showing the clear accuracy delta between baseline and trained model.

2. CommitGuard as a Plugin (Developer Workflow)

This use case is for software engineers interacting with the trained model during their daily development cycle to prevent vulnerabilities from reaching production.

2.1 Git Pre-Commit Hook (Local Plugin)

  • Scenario: A developer attempts to commit code containing an SQL injection (e.g., CWE-89).
  • Action: Developer runs git commit -m "Update user query". The hook captures the local diff and invokes the CommitGuard agent API.
  • Expected Result:
    • The agent detects the vulnerability before the commit executes.
    • The commit is blocked (exit code 1).
    • The terminal outputs the agent's XML exploit_sketch: "SQL injection in user_id via f-string construction."

2.2 CI/CD Pull Request Reviewer (GitHub Action)

  • Scenario: A developer opens a Pull Request with a new feature.
  • Action: GitHub Actions triggers a CommitGuard workflow container. The agent runs a full evaluation loop over the PR's diff patch.
  • Expected Result:
    • The agent posts an automated review comment directly on the PR.
    • If vulnerable, it flags the specific line and provides a remediation suggestion.
    • The PR status check turns Red (Failed) if a severe vulnerability is detected, preventing a merge to the main branch.

2.3 IDE Extension (VS Code / Cursor Integration)

  • Scenario: Real-time vulnerability detection while typing.
  • Action: Developer saves a file (Ctrl+S). The IDE plugin sends the local file diff to a hosted CommitGuard backend.
  • Expected Result:
    • The agent identifies an issue using its analyze action step.
    • A diagnostic warning (red squiggly line) appears under the vulnerable code snippet in the editor.
    • Hovering shows the agent's <reasoning> and suggested safe implementation.