YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)
SWE-Pruner ONNX (code-pruner)
ONNX-converted version of ayanami-kitasan/code-pruner for efficient CPU inference.
Source
- Original Model: ayanami-kitasan/code-pruner (safetensors)
- Training Code: Ayanami1314/swe-pruner
Architecture
- Backbone: Qwen/Qwen3-Reranker-0.6B (28 layers, hidden=1024)
- Multi-layer Fusion: Early (layer 7) + Middle (layer 14) + Final (layer 28) โ fused_hidden=3072
- Fusion: 1-layer MultiheadAttention (8 heads) + LayerNorm
- Compression Head: CRF-style (LayerNorm โ Linear(3072,256) โ GELU โ Linear(256,2))
- Output:
token_scoresโ sigmoid scores per token (0-1, higher = keep)
Files
| File | Description |
|---|---|
model.onnx |
Quantized ONNX model (uint8, ~607MB) |
vocab.json |
BPE vocabulary (Qwen3 tokenizer) |
merges.txt |
BPE merge rules |
metadata.json |
Model metadata (token IDs, dimensions) |
crf_params.npz |
CRF transition parameters (optional, for Viterbi decoding) |
Usage
import onnxruntime as ort
import numpy as np
sess = ort.InferenceSession("model.onnx")
input_ids = np.array([[...]], dtype=np.int64) # [1, seq_len]
attention_mask = np.array([[...]], dtype=np.int64) # [1, seq_len]
scores = sess.run(None, {"input_ids": input_ids, "attention_mask": attention_mask})[0]
# scores: [1, seq_len] float32, 0-1 range, higher = keep
Conversion Details
- Exported with PyTorch 2.8 + transformers 4.57
- Opset version: 14
- Dynamic axes: batch and seq_len
- Quantized: dynamic uint8 quantization
- Causal mask patched for ONNX trace compatibility
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support