YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

SWE-Pruner ONNX (code-pruner)

ONNX-converted version of ayanami-kitasan/code-pruner for efficient CPU inference.

Source

Original Model: ayanami-kitasan/code-pruner (safetensors)
Training Code: Ayanami1314/swe-pruner

Architecture

Backbone: Qwen/Qwen3-Reranker-0.6B (28 layers, hidden=1024)
Multi-layer Fusion: Early (layer 7) + Middle (layer 14) + Final (layer 28) → fused_hidden=3072
Fusion: 1-layer MultiheadAttention (8 heads) + LayerNorm
Compression Head: CRF-style (LayerNorm → Linear(3072,256) → GELU → Linear(256,2))
Output: token_scores — sigmoid scores per token (0-1, higher = keep)

Files

File	Description
`model.onnx`	Quantized ONNX model (uint8, ~607MB)
`vocab.json`	BPE vocabulary (Qwen3 tokenizer)
`merges.txt`	BPE merge rules
`metadata.json`	Model metadata (token IDs, dimensions)
`crf_params.npz`	CRF transition parameters (optional, for Viterbi decoding)

Usage

import onnxruntime as ort
import numpy as np

sess = ort.InferenceSession("model.onnx")
input_ids = np.array([[...]], dtype=np.int64)      # [1, seq_len]
attention_mask = np.array([[...]], dtype=np.int64)  # [1, seq_len]

scores = sess.run(None, {"input_ids": input_ids, "attention_mask": attention_mask})[0]
# scores: [1, seq_len] float32, 0-1 range, higher = keep

Conversion Details

Exported with PyTorch 2.8 + transformers 4.57
Opset version: 14
Dynamic axes: batch and seq_len
Quantized: dynamic uint8 quantization
Causal mask patched for ONNX trace compatibility

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support